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1.0  INTRODUCTION 


The  Research  Triangle  Institute,  along  with  faculty  and  staff  members 
of  the  Triangle  Uni versities—Ouke,  North  Carolina  State  University  and  the 
University  of  North  Carolina,  is  currently  conducting  fundamental  research 
in  the  area  of  on-line  fault  detection  in  modular  digital  systems.  The 
program  is  sponsored  by  the  Naval  Electronics  Systems  Command,  Code  3U4. 
This  document  reports  on  the  work  accomplished  during  the  1977-78  academic 
year. 

\ 1 .1  Study  Objective 

The  overall  objective  of  this  effort  is  to  discover  on-line  fault 
detection,  isolation  and  repair  techniques  and  measures  of  effectiveness 
which  will  be  ultimately  applicable  to  tactical  and  strategic  modular 
digital  systems.^ 

1.2  Study  Scope 

This  study  specifically  addresses  digital  system  built-in-test  as 
opposed  to  analog  system  built-in-test.  Digital  systems  of  interest  are 
those  which  are  composed  of  basic  stored  program  computer  elements.  Spe- 
cifically, such  structures  have  input/output  (I/O),  control,  memory  and 
arithmetic  processing.  Of  primary  interest  in  this  research  is  the 
derivation  of  results  which  will  be  applicable  to  structures  composed  of 
modular  hardware  and  software  building  blocks  that  constitute  the  elements 
of  basic  digital  computers. 

To  focus  the  efforts  of  this  research,  emphasis  is  placed  on  those 
techniques  which  are  applicable  to  on-line  (as  opposed  to  off-line)  fault 
monitoring.  The  exclusion  of  off-line  fault  detection  approaches  acknowl- 
edges the  immense  amount  of  work  which  has  gone  on  in  the  area  of  off-line, 
automatic  test  equipment  (ATE). 

1.3  Study  Approach 

Within  the  scope  of  work  stated  in  Section  1.2,  the  following  approach 
is  being  taken  in  this  study:  Continuous  and  sample  fault  monitoring 
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techniques  which  are  applicable  to  on-line  modular  digital  systems  are 
being  investigated.  Criteria  for  distributing  the  resulting  fault  moni- 
toring and  reporting  resources  throughout  modular  systems  hierarchies, 
including  both  hardware  and  software,  are  being  derived. 

1.4  Overview  of  Study  Results 

The  subtasks  reported  in  this  report  were  identified  in  a previous 
study  as  areas  where  fundamental  research  could  potentially  lead  to  a 
better  understanding  of  the  basic  issues  related  to  modular  digital  com- 
puter fault  detection  techniques  and  measures  of  effectiveness.  The  impor- 
tance of  a modular  system  approach  to  military  digital  system  realization 
lies  in  the  ability  of  users  of  such  systems  to  effect  repair  in  a timely 
manner  through  module  replacement.  Of  fundamental  importance  is  the 
ability  to  detect  module  faults  aid  to  report  them  in  a manner  so  that 
system  users  can  readily  identify  the  faulty  member  and  effect  repair. 

Subtasks  I and  II  specifically  address  some  fundamental  issues  rele- 
vant to  fault  detection  and  handling  techniques.  These  two  subtasks, 
commonly  referred  to  as  Continuous  Monitoring  (Subtask  I)  and  Sample 
Monitoring  (Subtask  II),  explore  error  detection  and  handling  techniques. 

The  emphasis  to  date  on  Subtask  I has  been  in  identifying  the  basic  issues 
of  programmable  computers  which  have  to  do  with  hardware  and  software  fault 
communication.  Of  particular  interest  have  been  the  interface  questions 
which  exist  as  a result  of  partitioning  fault  detection  and  handling 
resources  between  hardware  and  software  in  programmable  machines.  The 
investigators  on  this  subtask  view  as  nearly  inseparable,  error  detection 
and  error  handling  which  ultimately  lead  to  recovery. 

Subtask  II  focuses  on  sample  monitoring  as  a tecnnique  which  holds 
promise  of  being  able  at  some  point  to  effectively  utilize  large-scale 
integrated  (LSI)  circuit  devices  such  as  microprocessors  to  detect  faults 
in  modular  digital  machines.  The  attractiveness  of  this  approach  lies  in 
the  potential  universality  of  the  approach  to  a wide  variety  of  digital 
modules.  A fundamental  assumption  in  this  subtask  is  that  programmable  LSI 
devices  which  may  be  used  for  sample  fault  monitoring  must  operate  at 
slower  rates  than  the  processes  which  are  being  moni tored--as  a consequence, 
to  reduce  the  computational  load  of  the  sample  monitor  itself. 


2 


One  technique  which  appears  promising  is  the  use  of  statistical  moni- 
toring. This  technique  has  been  successfully  used  in  off-line  testing. 
However,  in  the  off-line  situation,  one  has  complete  control  over  the  in- 
puts to  the  unit  under  test.  Herein  lies  the  consequential  difference 
between  off-line  and  on-line  statistical  sample  monitoring.  The  sample 
monitoring  work  reported  here  addresses  the  issues  which  are  basic  to  sys- 
tems where  the  inputs  are  unspecified  and  non-determini stic. 

The  third  subtask,  which  has  been  identified  as  an  area  where 
additional  fundamental  work  is  clearly  needed  to  support  fault  monitoring 
in  modular  digital  systems,  is  the  BIT  resource  allocation  subtask.  The 
objective  of  this  subtask  is  to  identify  ways  to  effectively  distribute  BIT 
facilities  throughout  various  modular  system  hierarchical  levels,  including 
individual  module  collections  of  modules  (subsystem)  and  system  levels.  To 
accomplish  this  objective  one  must  understand  both  fault  detection 
techniques  and  ways  to  assess  the  effectiveness  of  such  techniques  at  the 
hierarchical  levels  of  interest. 

The  approach  taken  to  this  subtask  has  emphasized  both  analysis  and 
simulation  as  means  for  identifying  and  assessing  BIT  approaches  and  effec- 
tiveness at  various  hierarchical  levels.  Models  previously  used  have  been 
found  to  be  effective  for  off-line  and  inadequate  for  on-line  fault  detec- 


Ition  technique  evaluation.  For  example,  in  the  evaluation  models  presently 
being  used,  error  latency  and  the  impact  of  faulty  on-line  error  detectors 
have  not  been  totally  considered. 

The  following  subsections  present  a detailed  description  of  the  BIT 
research  subtasks,  including  subtask  problem  definitions  and  the  progress 
made  on  each  subtask  during  the  first  two  semesters  of  the  study.  The 
section  is  divided  into  three  major  subsections  which  correspond  to  the 
three  Uni versity/RTI  subtasks.  It  should  be  noted  that  these  sections 
reflect  the  emphasis  and  style  of  the  individual  investigators.  Section 
2.1  was  authored  by  J.  W.  Gault  (NCSU).  Section  2.2  was  written  by  P.  N. 
Marinos  (Duke)  and  K.  S.  Trevidi  (Duke).  Section  2.3  was  written  by  D.  L. 
Parnas  (UNC)  and  Don  Bowles  ( UNC ) . Appendices  A and  B are  two  masters 
theses  which  have  resulted  from  this  work  so  far. 
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2.1  SAMPLE  FAULT  MONITORING  (NCSU) 


by 

James  W.  Gault 

Department  of  Electrical  Engineering 
North  Carolina  State  University 

June  1978 
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STANDARD  ON-LINE  FAULT  MONITORING 


2.1.1  Introduction 


2 . 1 . 1 . 1 Problem  Statement 

The  origin  of  the  work  presented  here  was  an  earlier  effort  at  the 
Research  Triangle  Institute  to  investigate  the  feasibility  of  a standard 
built-in-test  circuit  suitable  for  use  with  a family  of  digital  electronic 
modules  [1],  [2],  [3],  The  basic  objective  during  the  search  for  a 
standard  approach  to  BIT  is  that  this  will  make  the  inclusion  of  test 
circuits  more  naturally  a part  of  the  normal  design  procedure. 

The  present  direction  of  the  work  discussed  here  has  as  a beginning 
point  the  assumptions  given  in  Table  1.  The  approach  to  on-line  fault  de- 
tection presently  being  considered  is  statistical,  since  this  seems  to 
offer  an  effective  method  of  achieving  a standard  approach  to  monitoring  a 
wide  variety  of  different  module  types. 


Table  1.  Problem  Assumptions 


1.  Modular  Digital  electronic  equipment  which  has  been  fielded,  i.e., 
has  passed  design  and  manufacturing  acceptance  tests. 

2.  The  BIT  circuits  will  monitor  on-line  operations  of  a digital  mod- 
ule and  will  not  alter  normal  processing.  The  ability  to  insert 
test  vectors  is  not  considered. 

3.  The  testing  objective  is  to  detect  multiple  as  well  as  single 
logic  faults.  Although  it  is  not  clear  what  the  response  to 
intermittent  faults  will  be,  they  are  not  specifically  excluded. 

4.  The  behavior  of  a module  being  monitored  will  be  characterized, 
assuming  stationary  and  independent  input  statistics.  The  impact 
of  non-stationary  and  dependent  inputs  will  be  studied. 


The  purpose  of  this  research  is  to: 


1.  Develop  analytic  methods  for  statistically  characterizing  digital  elec- 
tronic modules,  and 

2.  Evaluate  the  effectiveness  of  statistical  fault  monitoring  as  an  on-line 
built-in-test  technique. 
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2. 1.1. 2 Direction  of  Effort 

The  major  milestones  in  this  project  are  given  in  Table  2. 

Table  2.  Project  Milestones 

1.  Develop  software  capability  to  support  the  investigation: 

a.  Simulation,  and 

b.  Plotting 

2.  Develop  a statistical  model  for  a module  including: 

a.  Input  statistics, 

b.  Fault  model , and 

c.  Network  output  probabilities  as  a function  of  inputs,  network 
structure,  and  faults. 

3.  Develop  a monitoring  and  detection  strategy  based  on  the  model. 

4.  Develop  measures  of  effectiveness  for  evaluating  the  approach. 

5.  Develop  experiments  to  test  the  methods  developed. 

6.  Evaluate  the  outcome  of  the  effort  and  reconsider  the  models  and 
approach. 

7.  Final  report  of  results. 

2. 1.1. 3 Organization  of  the  Report 

There  are  three  sections  which  follow  in  this  report.  The  next  section 
will  outline  in  some  detail  the  related  results  reported  in  the  literature. 
Section  3 then  summarizes  the  current  status  of  this  research  and  reports 
the  results  obtained  to  date.  Section  4 defines  the  plans  and  approach  for 
work  in  the  next  six  months. 
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2.1.2  Background 

2. 1.2.1  Deterministic  Testing 

The  purpose  of  this  section  is  to  review  other  published  work  which  is 
relevant  to  the  present  effort.  In  addition  to  providing  a general  back- 
ground and  understanding  of  the  problem,  the  credibility  of  the  statistical 
approach  is  established. 

Deterministic  testing  procedures  attempt  to  identify  explicit  test 
vectors  which  detect  the  presence  of  a predetermined  set  of  faults.  This 
approach  falls  short  in  two  major  ways: 

1.  The  set  of  faults  considered  is  often  unreasonable  in  terms  of 
real  failures,  and 

2.  The  input  sequences  required  to  test  practical  networks  (which  are 
typically  sequential)  are  often  extremely  difficult  to  derive. 

In  light  of  this,  manufacturers  faced  with  the  problem  of  testing  tre- 
mendous volumes  of  circuits  of  ever  increasing  complexity,  resorted  to  using 
test  sequences  generated  at  random.!  This  approach  calls  for  the  re- 
sulting outputs  to  be  compared  with  the  response  of  a so-called  "gold  unit" 
to  the  same  inputs.  There  are  of  course  difficulties  with  this  approach  in 
that  the  generation  and  maintenance  of  the  reference  is  not  a trivial  task. 

Perhaps  more  troublesome  is  the  fact  that  the  quality  of  the  test  is  almost 
impossible  to  ascertain.  In  1975,  work  [5, 6, 7, 8]  focusing  on  the  develop- 
ment of  a theory  for  probabilistic  testing  began  to  appear  in  the  litera- 
ture. 

For  the  purposes  of  this  paper  we  will  limit  our  consideration  to  fault 
detection  only.  This  is  a reasonable  limitation  since  the  on-line  monitor 
which  is  envisioned  will  monitor  units  at  the  level  of  a replaceable  module 
and  hence,  detection  and  diagnosis  may  be  considered  synonymous.  Very 
little  work  concerning  fault  diagnosis  has  appeared  in  the  literature  with 
the  exception  of  a paper  by  Deschizeaux  et  al . [9]. 


1 The  Fluke  Trendar,  Data  Test  Corps,  Data  Tester  Services,  and  Micro- 
systems, Inc.,  MICRO  50U  are  examples  of  commercial  IC  testers  which 
utilize  random  sequence. 
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2 . 1 . 2 . 2 Probabilistic  Modeling  of  Network  Behavior 
In  particular,  work  by  Parker  and  McCluskey  [5,6]  developed  methods  by 
which  the  probabilistic  behavior  of  a network  could  be  described. 

Summarized  below  are  the  most  salient  results  from  this  work  which  are 
applicabl  to  the  work  reported  here. 

Rl.)  For  combinational  networks  the  probabilistic  behavior  of  an  out- 
put may  be  derived  as  a function  of  input  probabilities. 

R2.)  There  exists  a set  of  input  probabilities  such  that  the  no  two  n- 
variable  combinational  functions  have  the  same  output  probability 
Since  for  any  particular  combinational  network,  the  possible  faults  simply 
map  the  network  to  a new  function,  then  there  exists  a set  of  input  proba- 
bilities which  may  be  used  to  distinguish  the  good  function  from  any  faulty 
one. 


2. 1.2.3  Network  Statistics 

The  idea  of  monitoring  a module  in  an  on-line  situation  is  depicted  in 
Figure  1.  The  basic  objective  is  to  gather  statistics  on  the  inputs  (x), 
states  (s),  and  outputs  (z)  of  a general  module  and  to  conclude  the  present 
condition  of  the  module  from  this  data.  What  statistics  then  should  be  col 
lected?  Some  of  the  possibilities  are  enumerated  in  Table  3. 


Table  3.  Possible  Network  Statistics 

1.  Ones  (zeroes)  Counting  - For  *-j  e X,  s.j  e S and 

zi  € 1’  count  the  number  of  occurrences  of  ones  (zeroes) 
over  an  experiment  of  length  N.  Then  prob  (x^  = 1)  = 

COUNT-j/fj.  The  statistics  are  the  count  values  for  all 
xi  > s-j » z,  • 

2.  Transition  Counting  - For  x^  £ X,  s-j  e S,  and 

zi  e count  the  number  of  occurrences  of  transitions  (0^1 
or  1 ■+  a)  over  an  experiment  N. 

3.  Vector  Values  - For  X,  S,  and  Z as  vectors: 

a")  Collect  the  distribution  o?  the  inputs  as  vector  values. 

This  statistic  may  be  kept  as  2m  values  for  an  m element 
vector  or  may  be  quantized  into  fewer  ranges  of  values. 

b)  Collect  the  distribution  of  the  number  of  ones  (zeroes) 
occurring  in  each  vector  over  an  experiment  of  length  n. 

c)  Collect  the  distribution  of  the  number  of  transitions  in 

a vector  from  one  sample  value  to  the  next  for  an  experiment 
of  length  n. 


It  should  be  pointed  out  that  in  order  for  any  statistic  to  be  useable,  the 
following  properties  must  hold: 
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SAMPLED  MONITORING 


Figure  1.  General  Purpose  On-Line  Built-In-Test  (OBIT)  Monitor 


1.  The  statistical  behavior  of  network  outputs  or  states  (Z,  S)  must 
be  derivable  a priori  as  a function  of  the  input  statistic  (X). 

2.  The  behavior  of  the  output  or  state  statistic  must  be  varied  in 
the  presence  of  faults  so  that  the  difference  between  a measured 
statistic  and  its  a priori  expected  value  may  be  used  to  detect 
faults  as  in  R2  stated  earlier. 

All  reported  results  have  used  line  counting  procedures  (items  1 and  2 of 
Table  3) . 

Parker  [4]  examined  the  usefulness  of  three  types  of  statistics: 

1)  ones  counting,  2)  transition  counting,  and  3)  edge  counting.  He  demon- 
strated that  edge  counting  is  really  a subcase  of  transition  counting  and 
that  the  best  results  can  probably  be  obtained  using  a combination  of  ones 
and  edge  counting.  This  seems  intuitively  correct  since  the  ones  counting 
gives  a measure  of  the  number  of  inputs  and  edge  counting,  a measure  of  the 
order  of  inputs.  Hayes  [10]  provides  a very  careful  theoretical  treatment 
of  the  effectiveness  of  transition  counting.  This  method  is  often  applied 
by  commercial  testers  since  the  statistics  are  very  compact.  Hayes  shows 
that  there  are  faults  which  are  undetectable  if  only  transition  counting  is 
used. 

2. 1.2. 4 Evaluating  the  Effectiveness  of  Statistical  Testing 

Often  attendant  with  the  notion  of  statistical  testing  is  the  idea  that 
the  easiest  input  sequence  to  be  used  is  simply  a random  sequence.  Hence, 
there  are  frequently  references  in  the  literature  to  random  testing.  Clear- 
ly one  of  the  motivating  notions  of  statistical  testing  is  that  the  diffi- 
culty of  test  generation  found  in  deterministic  testing  can  be  avoided. 

The  question  of  primary  importance  is,  "How  good  a test  does  a random 
sequence  provide?"  Losq  [13]  obtains  general  expressions  for  three  very 
fundamental  parameters.  They  are: 

1.  The  probability  of  escape  [ES]  - This  parameter  defines  the  proba- 
bility that  a faulty  unit  will  escape  detection  in  an  experiment, 
given  that  a failure  does  exist. 

2.  The  probability  of  rejecting  a fault-free  unit,  i.e.,  a false 
alarm  [FA]  - This  parameter  defines  the  probability  that  a fault- 
free  unit  will  be  found  in  error  after  an  experiment,  given  that 
the  unit  is  fault- free. 
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3. 


The  test  stringency  [e]  - This  parameter  defines  the  window  (Z 
-e,  Z + e)  which  is  used  in  accepting  or  rejecting  a unit  based 
on  some  statistical  measure  Z.  Losq  considers  testing  in  an 
off-lin  situation  so  that  the  input  values  may  be  controlled.  An 
experiment  consists  of  measuring  the  statistics  of  the  output  and 
the  input  (or  controlling  it  to  a specified  value)  over  a 
sequence  of  N samples.  For  any  given  network  there  is  an 
expected  value  of  the  output  statistic  for  a particular  input 
statistic,  denoted  Ez(x).  The  probability  that  a fault-free 
unit  does  not  pass  an  experiment  is  given  by: 


Ez(xi_+  £ 


prob(FA)  = 1 - -Ez(x)k  -[l-E^x)] 

k = r M , \k/ 


N-k 


k»  Ez(x)  - e 


which  can  be  simplified  to 


prob(FA)  = £rfc 


(e^ 


Ez(x)-(1-Ez(x)), 


) 


which  is  bounded  by 


prob  (.FA)  i erfc 


erfc  is  the  complementary  error  function. 


This  is  a dramatic  result  which  indicates  that  in  the  limit,  the  false 
alarms  depend  only  upon  the  test  stringency  e and  the  length  of  the  experi- 
ment N.  As  one  might  suspect,  the  penalty  for  a wide  acceptance  window 
(large  e)  is  an  increase  in  escapes,  i.e.,  higher  prob  (ES). 

The  computation  and  description  of  the  prob  (ES)  is  not  as  direct  as 
that  for  prob  (FA). 


prob(ES) 


Ez(x)  + e 


,N-  k 


E f0  Q *EZ(X)  •(1-E2(x))','^0(EZ(x))dEZ 


k = Ez(x)  - 


J 
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The  complexity  arises  from  0 [Ez(x)]  which  is  a density  function  of  faulty 
circuits  which  produce  the  same  output  statistic  as  that  expected  for  the 
fault-free  circuit. 

Implied  by  this  function  is  the  necessity  to: 

1.  Identify  the  faults  which  are  to  be  considered,  and 

2.  Derive  the  network  output  statistics  under  the  influence  of  all 
faul ts. 

One  of  the  major  problems  in  evaluating  the  effectiveness  of  statistical 
testing  is  the  derivation  of  0 [Ez(x)].  It  is  possible,  albeit  tedious, 
to  compute  0 for  combinational  networks,  since  it  is  possible  to  compute  E(x). 
Since,  in  general,  for  sequential  networks,  it  is  not  possible  to  define 
Ez(x),  it  is  likewise  impossible  to  obtain  0. 

2. 1.2. 5 Statistical  Methods  for  Sequential  Networks 
The  testing  of  sequential  circuits  is  a significantly  more  difficult 
problem  than  treating  combinational  network.  Very  little  has  been  present- 
ed about  statistical  methods  for  treating  sequential  networks.  Two  papers 
have  specifically  addressed  the  problems  of  statistical  treatment  of  sequen- 
tial networks.  Shedletsky  and  McCluskey  [14]  focus  on  the  idea  that  a fault 
in  a network  will  occur  prior  to  any  indication  of  a failure  at  an  observ- 
able output.  This  delay  is  called  error  latency.  When  the  network  is  se- 
quential and  the  inputs  random,  then  this  error  latency  may  be  quite  large. 
More  formally: 

Definition:  The  error  latency  ELk  for  a fault  Fk  is  the  number  of 
input  vectors  applied  to  a circuit  while  Fk  is  active  until  the  first 
incorrect  output  due  to  Fk  is  observed. 

Since  the  latency  is  dependent  upon  the  sequence  of  inputs  used,  ELk  is 
defined  probabilistically  assuming  the  input  is  a random  process.  The 
basic  definition  may  be  extended  to  develop  the  notion  of  an  acceptable 
detection  test  length  of  random  inputs. 

Definition:  The  latency  interval  n(c)k  of  a fault  Fk  is  the  minimum 
number  of  applied  inputs  necessary  to  achieve  a probability  C of  observ- 
ing an  error  due  to  fault  Fk. 

This  reference  described  the  definition  of  an  ELM  model  which  may  be  used 
to  establish  the  test  length  n which  is  required  to  establish  a minimum 
test  quality  of  c over  a set  of  fault. 


On  a somewhat  different  track  and  more  as  an  extension  to  work  in  com 
bi national  networks,  Parker  and  McCluskey  [18]  describe  techniques  for  de- 
riving the  output  probabilities  of  sequential  networks  using  reguTar  ex- 
pressions. A regular  expression  is  a precise,  unambiguous  language  for  de 
scribing  finite  automata.  While  no  pretense  is  made  that  this  approach  is 
generally  applicable,  it  does  indicate  a sense  of  direction  for  sequential 
circuit  analysis. 


2.1.3  Status  and  Results  of  Research 
2. 1.3.1  Problem  Definition 

The  objective  of  this  research  is  to  define  and  evaluate  a standard 
approach  to  on-line  fault  detection  using  buil t-in-test  elements.  This  ob- 
jective is  now  focused  on  a statistical  approach  to  on-line  monitoring  as 
described  in  Chapter  2. 

The  standard  testing  strategy  being  developed  is  given  in  Table  4. 

Table  4.  A Standard  BIT  Strategy 

For  a given  module 
A PRIORI 

1.  Define  a set  of  monitor  points.  In  general,  these  will  include 
inputs  (X),  outputs  (Z),  and  internal  state  values  (S). 

2.  Derive  for  the  states  and  outputs  to  be  monitored,  an  expected 
value  E$(x)  and  Ez(x)  as  a function  of  the  input  statistic. 

3.  Derive  a test  stringency  £ and  test  length  n based  on  desired  test 
quality,  false  alarm  rate,  and  escape  rate.  This  will  involve  the 
definition  of  a fault  model  and  the  fault  density  functions  0 (X, 
0(X,S). 

ON-LINE 

4.  During  system  execution,  the  fault  monitor  must  then: 

a.  For  an  experiment  of  length  n gather  statistics  on  X,  S,  and 
Z. 

b.  Indicate  a failure  if  the  measured  statistic  (e.g.,  mz)  is  out- 
side the  acceptance  window  for  the  expected  value  of  this  sta- 
tistic as  a function  of  the  measured  input  statistic  mx. 

FAIL  = mz  ^ Ez  (mx)  + e or 
mz  < Ez  (mx)  - e 
PASS  otherwise 

2. 1.3.1  Measures  of  Effectiveness 

One  of  the  most  fundamental  issues  in  this  research  is  to  establish 
measures  of  effectiveness  which  can  be  used  to  accept  or  regret  statistical 
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monitoring  as  an  on-line  strategy.  The  primary  measures  to  be  considered 
were  given  in  Section  2.4.  These  measures  are  shown  pictorial ly  in  Figure 
2.  In  addition  to  test  quality,  false  alarm  rate,  and  escape,  the  error 
latency  of  an  experiment  will  also  be  considered.  These  parameters  are 
defined  in  Table  5.  The  more  classic  views  of  BIT  measures  are  given  in 
Tabl  6 and  will  also  be  considered. 

For  any  particular  experiment,  once  the  length  n and  stringency  e are 
known,  the  probability  of  a false  alarm  may  be  computed  as  it  does  not  de- 
pend (in  the  bound)  on  run  time  statistics.  A plot  of  P ( FA ) as  a function 
of  n for  two  typical  values  of  e is  given  in  Figure  3. 

The  evaluation  of  the  probability  of  escape  depends  upon  N,  e,  0 ( x ) and 
the  run  time  input  statistic.  For  a particular  simple  sequential  network, 
the  probability  of  escape,  which  results  if  the  occurrences  of  state  B are 
used  as  the  monitored  statistic,  is  shown  in  Figure  4.  This  figure  may  be 
interpreted  in  the  following  way.  For  some  N,  e (N  = 10,000,  e = .01) 
values,  if  the  input  statistics  are  measured  and  the  number  of  entries  into 
state  B is  taken  as  the  measured  output,  then  the  probability  that  a 
failure  will  go  undetected  is  very  sharply  a function  of  the  measured  input 
statistic  mx.  If,  in  Figure  4,  mx  is  around  0.5,  then  the  probability  of 
escape  is  quite  low.  The  sensitivity  of  this  parameter  to  experiment 
length  and  stringency  is  indicated  by  the  plot  of  Figure  5.  For  the 
example  circuit,  which  converges  to  steady  state  statistical  values  quite 
quickly,  an  order  of  magnitude  change  in  experiment  length  (N  = 1,000  to  N 
= 10,000)  makes  little  or  no  difference  in  the  probability  of  escape.  A 
change  in  the  stringency  from  e = .05  to  e = .01  causes  a significant 
improvement.  Recall  that  such  a change  in  stringency  will  cause  p( FA)  to 
degrade.  It  is  clear  then  that  the  selection  of  a stringency  will  cause  a 
trade-off  in  the  probability  of  escape  vs  false  alarms.  Both  parameters 
improve  with  an  increase  in  the  experiment , length.  However,  there  may  be  a 
cost  associated  with  long  experiments. 
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BIT  OUTCOMES 


on  Parameters 


Table  5.  Typical  Bit  Measures 


EXAMPLE  MEASURES: 


% FUNCTION  TESTED 
CYCLES  FOR  TESTING 
MTBF  CHANGE 
CONFIDENCE  LEVEL 
% HARDWARE  FOR  BIT 
BIT  FORM,  FIT,  & POWER  REQUIREMENT 
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Table  6.  Probabilistic  Measures  of  OBIT  Approaches 


MEASURES: 

MTDF  (Mean  Error  Latency,  Sampling  Ratio, 
Test  Quality) 

Test  Qual ity : The  P 

A Test  (N)  Will  Detect  A Failure 
If  One  Exists 

False  Alarm  Rate:  The  P 

A Test  (N)  Will  Detect  A Failure 
When  None  Exists 

Escape:  The  P 

A Test  (N)  Will  Pass  When 
A Failure  Exists 
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(Vd)  d 


J 
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lse  Alarm 


PROBABILITY  OF  ESCAPE 


Figure  5.  Sensitivity  to  Changes  in  Experiment  Length  N and  Stringency  e 
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2. 1.3. 3 Computation  of  Fault  Density  Functions 

In  order  to  be  able  to  compute  the  probability  of  escape  and  hence  the 
test  quality,  it  is  essential  to  be  able  to  derive  the  fault  density  func- 
tion for  a module.  As  stated  in  Section  2.4,  this  is  one  of  the  primary 
difficulties  and  hence  objectives  of  this  research.  A fault  density  func- 
tion describes,  for  a particular  statistic,  the  number  of  faults  which  pro- 
duce a particular  value  of  output  statistic  for  a particular  value  of  input 
statistic.  Mountain  peaks  represent  places  where  a large  numper  of  faults 
produce  identical  behavior  as  far  as  this  particular  statistic  is  con- 
cerned. Figures  6a  and  6b  show  fault  density  functions  for  two  output 
statistics.  If  a small  value  of  the  input  were  measured  in  an  experiment 
and  a small  value  of  the  output  statistic  (say  state  B of  Figure  6a),  then, 
since  a large  number  of  fault  functions  (high  density)  are  present  in  this 
area,  it  will  be  difficult  to  distinguish  which  function  produced  the 
result.  Since  we  can  compute  the  probability  of  state  B as  a function  of  x 
for  the  fault- free  network,  we  can  determine: 

1.  If  the  fault-free  value  is  within  + e of  the  measured  value,  then 

either  the  module  is  functioning  as  a good  machine  or  one  of  the 

faulty  functions  in  the  neighborhood.  Such  a condition  will  re- 
sult in  a high  probability  of  escape  for  these  faulty  functions. 
Note  that  there  may  be  another  output  statistic  which  can  be  used 
to  refine  the  pass/fail  decision. 

2.  If  the  fault-free  value  is  outside  the  + e window,  then  we  will 

declare  a failure.  The  likelihood  of  this  failure  indication  be- 

ing a false  alarm  is  a function  of  the  experiment  length  and  + e 
size.  It  should  be  pointed  out  that  the  assumption  of  stationary 
inputs  is  central  to  this  result. 

The  foregoing  discussion  has  shown  the  fault  density  function  and  its 
utility.  The  question  at  hand  is  how  can  it  be  computed! 

If,  for  a particular  statistic  Z,  we  can  compute  the  output  probabil- 
ity Z(x)  for  the  good  network,  then  the  network  function  may  be  perturbed 
by  a fault  f and  Z^(x)  computed.  This  is  done  in  Figure  7 for  an  exam- 
ple circuit  with  9 assumed  faults.  The  number  of  failures  producing  a par- 
ticular output,  quantized  to  some  practical  size  (.05  was  used  in  Figure  6), 
may  be  accumulated  and  plotted  as  a third  dimension. 
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This  approach  is  clearly  impractical  for  any  reasonable  size  problem  and 
other  methods  are  being  studied.  However,  as  a starting  point  we  will  con- 
tinue to  develop  methods  for  describing  Z ( x ) for  sequential  networks  and 
then  use  Z ( x ) to  generate  0(z)  by  simulation  until  more  analytical 
approaches  can  be  developed. 

2. 1.3. 3.1  Sequential  Primitives 

In  an  attempt  to  derive  z(x)  for  sequential  networks,  we  will  consider 
the  derivations  for  simple  sequential  primitives  and  then  look  for  ways  to 
form  more  complex  component  functions. 

Simple  Controllers  - Simple  controllers  are  defined  by  state  tables,  state 
diagrams  or  sequential  networks  from  which  a state  table  may  be  derived. 

Tha  operation  of  the  sequential  machine  under  random  inputs  may  be 
treated  as  a Markov  process  since  the  present  state  depends  only  upon  the 
previous  state  and  the  input.  A transition  matrix  P may  be  formed  from  the 
state  table  (diagram)  and  the  state  probability  after  a long  set  of  inputs 
may  be  found  from  Pn  with  n -*■  <*.  . An  example  of  these  calculations  for 
a simple,  single  input  network  is  shown  as  example  1.  The  fault  density 
function  can  be  found  for  a network  by  evaluating  the  state  diagram  < and 
hence  P matrix),  which  results  with  the  occurrence  of  each  fault,  and  then 
evaluating  the  output/state  probabil ities  for  the  new  machine.  The  fault 
density  functions  for  states  A and  C of  the  network  of  example  1 and  10 
assumed  faults  were  given  earlier  as  Figures  6a  and  6b. 

FI  ip-Fl  ops/Shi  ft  Regi  ster  s/Counters  - The  method  for  obtaining  51  given  in 

conjunction  with  example  1 is  applicable  to  any  sequential  circuit.  Since 

it  is  computationally  unfeasible,  a new  approach  is  considered  here.  The  . 

basic  idea  is  to  define  the  output  and  fault  density  functions  for  well 

defined  sequential  primitives.  The  objective  is  then  to  derive  these  same 

functions  for  more  complex  composites  of  these  primitives  by  some 

composition  of  functions.  The  output  functions  for  a flip-flop  (FF)  2-bit 

shift  register  and  a 3-bit  counter  are  derived  below. 

The  notion  of  a composite  function  is  demonstrated  by  first  deriving 
the  output  function  for  an  FF  and  then  using  this  result  to  derive  the 
behavior  of  a shift  register  which  is  comprised  of  FF's.  A second  approach 
is  then  used;  the  shift  register  output  function  is  derived  directly;  and 
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Example  1.  Computation  of  Output  Probability  for  Sequential  Machines 
State  diagram: 


0/0 


Transition  matrix:  probability  of  an  input  =1  is  P,  0 is  1-p 


P = 


A 

B 

C 

D 


A 

P 

o 

P 

0 


B CD 

(1-p)  0 0 ' 

(1-p)  p 0 

0 0 (1-p) 

(1-p)  P o 


pnn-+  1 


p2  1 -2p+2p2-p3 


P(l-p) 


(1-P)2P 


pn  converges  for 
n=3 


If  states  are  taken  as  outputs  the  probability  of 
the  state  outputs  as  a function  of  the  input  proba- 
bility is  given  by 

O 

probability  of  state  A=PA=p 

Simili  PB=l-2p+2p2-p3 

PC=P(1-P) 

PD=p(l-p) 2 
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the  results  of  the  two  approaches  are  shown  to  agree. 
Example  2 - Derivation  of  an  FF  Output  Function 


CLOCK 


j = prob  J=1  P i 

k = prob  K=1  k 1-k 

j+k  >1  L 

p(A)+p(B)=1 

p(A)  = JL  p(B)  =_i_ 
j+k  j+k 


Note:  a similar  result  may  be  obtained  by  solving  the  simultaneous  equa 
tions  which  can  be  directly  written  from  the  state  diagram. 


p( A)  = p( A)  (1-j)  + p(B)  k 
p ( B ) = p( A)  j + p(B)  (1-k) 
p ( A)  + p ( B ) = 1 


Example  3 - Two  Alternative  Derivations  for  a 2-3IT  Shift  Register 
a.  Modeled  as  a shift  register 


FOR  MODE  = 0 
THAT  IS  SHIFT 


1.  The  occurrence  of  activity  on  a single  output  A or  B (ones, 
transitions),  or 

2.  The  occurrence  of  a vector  value  (weight,  transitions,  value). 
The  Markov  model  lends  itself  most  naturally  to  the  vector  value 

statistic  (namely,  the  occurrence  of  states) 

Code 

State  QB  Qa 

A 0 0 

B 0 1 

C 1 1 

D 1 0 


Statistics  which  might  be  collected  are: 
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The  equations,  taking  into  account  parallel  loading 
m = mode  probability,  s = serial  input 
a,  b = parallel  input  probabilities 
p ( A)  = sm-ssm2-samm+bam 
p(B)  = sm  - s2  m2  - samm+bam 
p(C)  = s2  m2  + a smm+m  a b 
p(D)  =m2s  s + mmsa  + bam 
where  x = (1-x) 


Statistics  for  the  individual  flip-flop  outputs,  qA  = prob  QA  = 1,  qB  = 
prob  QB  = 1 may  be  derived  using 

qA  = PB  + PC 
qB  = PC  + PD 

resulting  in  the  equations 

qA  = s m + m a 
qB=m2s+amm+mb 


b.  Modeled  as  a composite  network 


similar)  by  combining  the  FF  behavior  with  the  combinational  network  output 
X.  The  FF  equation  derived  earlier  can  be  simplified  since  J = K = x, 
hence 


p(Qa)  = 


(1-x) 
x + (1-x) 


1-x 


Now,  for  the  combinational  network,  the  output  probability  as  a function  of 


qB  = qA  - qAm  + ma  or 
qB  = m^s  + amm  + mb 


These  results  agree  with  the  derivation  in  part  A.  The  state  probabilities 
can  be  obtained  from  the  equations. 

P ( A)  = "qA  • qB  p(C)  = qA  • qB 

p(B)  = qA"  • qB  p(D)  = qA  • qB 

Example  3 demonstrates  the  composition  procedure.  The  important  im- 
plication is  that  once  derived,  a function  may  then  be  used  in  more  complex 
situations.  In  this  manner  a catalog  of  functions  and  their  statistical 
behavior  may  be  used  to  derive  general  modules  just  as  packages  of  a logic 
family  are  used  to  realize  the  modules.  Example  4 demonstrates  the  deri- 
vation of  the  statistical  profile  for  the  3-bit  counter. 


CLOCK 


Defining  the  various  modes  as 
R = low,  reset 
PT  = high,  count  enable 
L = low,  parallel  load 


a = PT-L-R 
6 = PT-L-R 


Y 


= L-R 
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results  in  state  values  of 


2. 1.3. 3. 2 Simulation  and  Experimentation 

Simulation  and  experimentation  programs  have  been  written  and  run  to 
drive  fault  density  functions  for  some  simple  circuits  with  a limited  num- 
ber of  assumed  faults. 

The  network  of  example  1 was  simulated  with  ten  single  stuck  faults 
and  the  results  plotted.  These  plots  are  given  as  Figures  6a  and  6b.  In 
addition,  simulation  experiments  were  run  to  verify  the  state  probability 
results  made  analytically.  Several  examples  were  run  showing  very  good 
results,  i.e.,  differences  of  less  than  + .06  between  simulated  and 
experimental  results.  The  results  often  converged  to  this  range  after  as 
few  as  100  input  conditions.  Example  5 shows  the  results  of  a sample  run. 

2. 1.3. 4 Input  Stationari ty 

The  effectiveness  of  the  methods  proposed  here  are  vulnerable  to  the 
validity  of  the  assumptions  which  are  applied  in  order  to  perform  the 
required  analysis.  Probably  the  most  difficult  assumption  to  verify  is 
that  of  input  stationari ty.  The  truth  of  the  matter  is  that  we  do  not  have 
any  clear  perception  of  the  behavior  of  the  inputs  to  a general  digital 
module  operating  on-line  in  situations  which  are  clearly  data  and  appli- 
cation dependent.  A lack  of  stationari ty  will  directly  impact  the  validity 
of  a predicted  value  of  an  output  statistic.  This  can  be  accommodated  by 
increasing  the  acceptance  window  {+  stri ngency  e ) . This,  of  course, 
results  in  an  increased  likelihood  of  escape  and  a degradation  in  test 
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Example  5:  Simulation  Behavior  As  Composed  To  Analytic  Results 


X1  =1 

X2-X3-X4  = (1-p)2p 


State 

Simulated 

Average 

Visitations 

A 

0.0358 

B 

0.3480 

C 

0.3244 

D 

0.2918 

A 

0.2815 

B 

0.2814 

C 

0.2900 

D 

0.1471 

A 

0.8069 

B 

0.0907 

C 

0.0921 

D 

0.0103 

Computed 

State 

Probabilities 

0.03690 

0.33210 

0.33210 

0.29889 


0.28571 

0.28571 

0.28571 

0.14286 


0.81081 

0.09009 

0.09009 

0.00901 


0.03690 

0.33210 

0.33210 

0.29889 


0.28571 

0.28571 

0.28571 

0.14286 


0.81081 

0.09009 

0.09009 

0.00901 


PA=p/(p1-3p+3) 

PB=  (1-p)/(p2-3p+3) 
Pc=  (1 -p)/(p2-3p+3) 
PD=  (1-p)2/(p2-3p+3) 


Input 

Probabilities 


# of  Applied 
Vectors 

10,000 


10,000 


10,000 
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quality.  In  order  to  obtain  some  information  concerning  the  sensitivity  of 
the  network  statistics  to  nonstationari ty  inputs,  the  model  shown  in  Figure 
8 was  used.  The  function  used  to  control  the  nature  of  the  nonstationari ty 
is  given  in  Figure  9.  The  beginning  and  ending  sections  are  sinusoidal  and 
the  three  break  times  t^ , t2>  may  be  controlled  to  create 
the  desired  function.  Example  6 shows  two  separate  cases  and  the  results. 
The  circuit  of  example  1 was  used. 


© — 

Pseudo 

Random 

Number 

Generator 

a.  for  controlled  input  prob  (x)  e = Constant 

b.  for  random  input  0 is  not  used. 

c.  for  nonstationari ty  9 (t)  is  used. 

Figure  8.  Input  Probability  Control  Model 


© 


threshold 
function  to 
control  input  prob 


-*■  prob  (x) 
i nput 
fed  to 
network 
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Example  6 
A.  Model : 


Figure  9.  Non-Stationary  Profile  e(t) 


0Max  = .5 

Resul ts 

Theoretical* 

Simulation 

t^  = 100  units 

Input  P(x) 

.458 

t2  = 900  units 

State  P( A) 

.209 

.239 

t3  = 1000  units 

State  P ( B ) 

.407 

.437 

State  P ( C ) 

.248 

.219 

State  P(D) 

.134 

.105 

*treating  P ( x ) as  stationary  at  resulting  simulated  value. 


B.  Model : 9 Max  = .5 


P(x) 


.267 


2. 1.3. 5 Fault  Models 

At  the  present  time  stuck  package  pin  faults  have  been  used  as  faults. 
In  the  future  we  will  consider  other  models  and  i ntermi ttents . For  the 
present  time  the  stuck  fault  model  is  adequate  since  the  focus  is  on  other 
i ssues. 


2. 1.3. 6 Summary 

A summary  of  the  results  and  capabilities  developed  to  date  are  given 
in  Table  5. 


Table  5.  Summary  of  Results 

1.  Demonstrated  simulation  capability. 

2.  Demonstrated  plotting  capability. 

.3.  Detailed  problem  definition  refinement. 

4.  Identification  of  measures  of  effectiveness  with  a quantitative 
formulation. 

5.  Definition  of  a standard  approach  to  on-line  fault  monitored.  De- 
tailed steps  required  to  characterize  and  test  a module  statisti- 
cally. 

6.  Computation  method  for  probabilistic  output  functions  and  fault 
density  functions  for 

Simple  controllers,  and 
Sequential  primitives 

7.  Tentative  definition  of  a fault  model. 

8.  Experimental  analysis  of  input  stationari ty. 

2.1.4  Plans  and  Projections 

The  results  produced  to  date  provide  considerable  encouragement  as  to 
the  usefulness  of  a statistical  model  in  on-line  fault  monitoring.  There 
are  a large  number  of  questions  which  can  be  posed  as  a result  of  the  effort 
to  date.  The  work  which  will  be  persued  is  in  three  major  related  categor- 
ies: 
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1.  Statistical  characterization  of  general  digital  electronic  modules. 

2.  Evaluation  of  the  cost  and  effectiveness  of  on-line  statistical 
monitoring  as  a standard  BIT  strategy. 

3.  Experimentation  and  validation  of  the  theoretical  concepts. 

More  detailed  goals  for  each  category  are  given  in  Table  6.  The  immediate 
emphasis  is  on  items  la,  2a,  and  2b.  These  steps  involve  the  development 
of  the  analytic  tools  required  to  model  a module  and  evaluate  the  statis- 
tical approach  in  a tentative  way.  The  other  goals  are  objectives  which 
should  bring  the  next  level  of  understanding  and  maturity  to  the  approach. 
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Table  6.  Research  Goals 

1.  Statistical  characterization  of  a module. 

a.  Develop  analytic  methods  for  describing  module  state  or  output 
probabilities  as  a function  of  input  probability. 

Specifically  consider  sequential  modules. 

b.  Investigate  the  choice  of  statistics  which  are  most  effective 
for  describing  a module. 

c.  Study  alternative  strategies  for  pass/fail  determination. 

d.  Develop  computationally-feasible  methods  for  deriving  experi- 
ment stringency  and  length  for  particular  modules. 

2.  Evaluation  of  cost  and  effectiveness. 

a.  Evaluate  various  fault  models  and  their  statistic  characteri- 
zation. 

b.  Develop  computationally  feasible  methods  for  obtaining:  1)  the 
fault  density  function  for  a module,  given  the  fault  model  from 
la,  and  2)  the  probability  of  escape  and  false  alarm  as 
functions  of  experiment  length  and  test  stringency. 

c.  Study  the  sensitivity  of  the  approach  with  regard  to  statisti- 
cal properties  of  the  module  inputs. 

3.  Experimentation  and  validation  of  theoretical  concepts. 

a.  For  the  present  time  this  will  involve  simulation  experiments 
designed  to  model  modules  statistically  and  to  determine  if 
inserted  faults  are  detectable  by  a shift  in  statistical  prop- 
erties. 

b.  Experiments  to:  1)  determine  the  validity  of  the  concept  for 
nonstationary  inputs,  and  2)  study  the  performance  of  monitor- 
ing as  a function  of  test  stringency  and  experiment  length. 


We  are  very  close  to  having  the  concepts  developed  which  are  required 

1.  Take  a simple  module  and  characterize  it  statistically,  i.e., 
define  n,  e,  s(x),  z(x)  for  all  s,  z. 
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2.  Define  the  theoretical  performance  which  will  be  obtained  for  the 
network  with  an  established  set  of  faults,  experiment  length,  (N) 
and  stringency  (£),  and 

3.  Demonstrate  experimentally  that  simulated  faults  can  be  detected 
and  that  good  machines  are  not  rejected. 

The  development,  documentation  and  utilization  of  software  tools  required 
to  support  this  effort  are  an  important  correlary  effort. 
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2 . 2 . 1 . 1 Introduction 

This  paper  is  concerned  with  the  analysis  and  design  of  on-line  Built- 
In-Test  (BIT).  Such  systems  are  characteri zed  by  on-line  fault  monitoring, 
and  therefore,  a study  of  the  effectiveness  of  on-line  fault  monitors  is 
important  [1,2].  Existing  models  of  systems  analysis  [3,4]  are  inadequate 
for  modeling  systems  with  on-line  fault  monitoring  since  they  assume  that 
fault  detection  occurs  in  zero  time  and  that  the  fault  monitor  never  fails. 
Our  models  will  allow  a finite  detection  latency,  an  imperfect  fault  moni- 
tor, multiple  fault  monitors,  and  multiple  classes  of  faults. 

The  analysis  problem  occurs  when  the  s., stem  structure  is  specified  and 
we  are  interested  in  evaluating  the  performance  of  the  system.  Such  an 
analysis  will  be  probabilistic  in  nature  since  the  failure  modes  of  various 
components  of  the  system  are  probabilistic.  If  a repair  facility  is  in- 
cluded in  our  model,  ( i . e . , the  system  is  repairable  or  maintained),  then 
the  performance  metric  of  interest  is  the  steady  state  system  availability. 
On  the  other  hand,  if  the  system  is  non-mai ntai ned  (or  non-repairable) , 
the  performance  measure  of  interest  is  the  system  reliability  as  a function 
of  the  mission  time. 

Due  to  finite  detection  latency,  it  is  possible  that  a fault  has 
occurred  in  the  system  but  it  is  not  yet  detected.  Such  a state  of  the 
system  is  clearly  undesirable.  The  purpose  of  an  on-line  fault  monitor  is 
to  reduce  the  probability  that  the  system  is  in  the  undesirable  state.  We 
will  give  explicit  expressions  of  the  effectiveness  of  a fault  monitor  in 
achieving  this  goal.  Our  analysis  will  cover  both  maintained  and 
nonmai ntai ned  systems. 

The  problem  of  system  design  is  to  configure  the  optimal  system  for 
the  stated  purpose.  In  our  context,  we  are  interested  in  choosing  a fault 
monitor  that  yields  a system  with  optimum  cost-performance.  The  trade  is 
between  the  cost  of  the  monitor  and  the  cost  associated  with  the  time  the 
system  spends  in  the  undesirable  state.  In  several  simple  cases,  we  will 
give  closed  form  solutions  that  characterize  the  optimal  fault  monitor. 
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II 


The  basic  constituent  of  the  system  that  we  consider  is  called  a mod- 
ule as  shown  in  Figure  1.  Two  types  of  modules  will  be  considered.  One 
type  is  the  non-mai ntai ned  (or  non-rejpai  rable)  module  M and  the  other  type 
is  the  maintained  (or  repairable)  module  M‘ . Module  M consists  of  the 
functional  unit  U and  its  on-line  fault  detector  D.  The  module  M'  consists 
of  the  functional  unit  U,  the  detector  D and  a repair  facility  R.  An 
example  of  a functional  unit  is  an  arithmetic  unit,  and  the  corresponding 
detector  could  be  its  modulo-3  checker.  If  the  functional  unit  is  a 
complex  processing  unit,  then  the  detector  could  be  a software  routine 
executed  on  a microprocessor.  Thus,  both  continuous  and  sampled  on-line 
fault  detectors  (or  monitors)  are  modeled.  If  a system  consists  only  of 
non-  maintained  modules,  then  it  is  a non-mai ntai ned  system  and  the 
performance  measure  of  interest  is  its  reliability  for  a given  mission 
time.  On  the  other  hand,  for  a system  consisting  of  maintained  modules, 
the  performance  measure  of  interest  is  its  steady  state  availability  [3]. 

Consider  a series-parallel  system  consisting  of  s-serial  stages  where 
the  ith  stage  has  n^  identical  modules  in  parallel  (see  Figure  2). 

Assume  that  the  failures  of  all  units  are  independent  of  one  another.  Now 
if  the  system  is  non-mai ntai ned,  then  its  reliability  is  given  by  [3],  " " 

Vtem(t>  ■ TTfi  - u - M‘>>ni]  (1) 

i=l  L 

where  Ri  is  the  availability  of  any  module  at  the  ith  stage. 

Next,  consider  a similar  maintained  system  and  assume  that  each  module 
has  its  own  repair  facility.  Then  the  system  availability  is  given  by  [3] 

s 

Asystem  = IT  [l  - U - A,  )ni  121 

i=l  L J 

where  A.,-  is  the  availability  of  any  module  at  the  stage. 
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M : NON-MAINTAINED  MODULE 
M':  MAINTAINED  MODULE 


Figure  1.  Basic  System  Module 
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From  the  above  discussion  we  may  conclude  that  for  series-parallel 
systems  of  independent  modules,  it  is  enough  to  analyze  the  reliability  (or 
availability)  of  individual  modules.  Once  we  have  computed  these,  a tri- 
vial application  of  one  of  the  formulae  (1)  or  (2)  given  above  yields  the 
system  performance  measure  desired.  Therefore,  we  will  only  analyze  the 
performance  of  module  M or  module  M* . 

Methods  of  analysis  and  design  of  these  two  types  of  units  will  be 
studied  in  Sections  II  and  III,  respectively. 

2. 2. 1.2  Analysi s 

In  this  section,  we  will  discuss  the  analysis  of  a maintained  module, 
and  will  consider  the  analysis  of  a non-mai ntai ned  module. 

Analysis  of  a Maintained  Module 

We  will  first  present  the  well-known  analysis  of  a simple  maintained 
module,  devoid  of  the  on-line  detector  D.  Throughout  the  subsequent  dis- 
cussion we  will  assume  that  the  time  between  two  successive  failures  of  the 
functional  unit  U is  exponentially  distributed  with  mean  1/X  • Thus,  the 
failure  rate  is  X and  the  Mean-Time-Between-Fai  1 ures  (MTBF)  is  1/x  . We 
assume  that  the  time  to  repair  is  exponentially  distributed  with  mean  1/u  . 
Thus  the  repair  rate  is  u and  the  Mean-Time-To-Repai r (MTTR)  is  1/p. 

The  module  has  two  possible  states,  F (failed)  and  W (working  prop- 
erly). The  state  diagram  of  the  module  is  shown  in  Figure  3.  Let  Pp  be 
the  steady  state  probability  that  the  module  is  in  state  F.  Similarly,  let 
Pw  be  the  steady  state  probability  that  the  system  is  in  state  W.  It 
can  be  shown  that  [3] 


P 

W 


p 

X + p 


and 


p 

x + p 
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Now,  the  module  availability  A is  simply  Pw  by  definition.  Thus 
a - _JL_  = 1/X  - MTBF 

" X + u 1/X  + 1/u  MTBF  + MTTR  (3) 

There  are  many  drawbacks  of  this  simple  well  known  model  of  avail- 
ability. First,  it  assumes  that  the  detector  is  perfect;  i.e,  the  detector 
never  fails.  Second,  it  assumes  that  the  time  to  detect  failures  is  neg- 
ligible or  the  detection  latency  is  zero.  We  present  a model  below  that 
removes  both  these  drawbacks. 

We  make  all  the  assumptions  made  earlier  for  the  simple  two-state 
model.  In  addition,  we  assume  that  the  time  to  detect  failures  is  exponen- 
tially distributed  with  mean  1/6.  Thus  the  detection  rate  is  6 and  the 
Mean  Time-To-Oetect-Fai  1 ures  (MTDF)  is  1/6.  The  time  to  failure  and  the 
time  to  repair  for  the  detector  are  exponentially  distributed  with  mean  1/f 
and  1/6,  respectively.  The  module  can  be  in  any  one  of  the  four  states: 
W,  F,  D and  C.  In  state  W,  the  module  is  functioning  properly;  in  state  F, 
the  functional  unit  has  failed  but  the  failure  is  not  yet  detected.  In 
state  D,  the  failure  is  detected  and  the  functional  unit  is  under  repair. 
In  state  C,  the  detector  has  failed  and  it  is  under  repair.  The  state 
diagram  is  given  in  Figure  4.  The  steady  state  probabilities  for  each  of 
these  states  can  be  obtained  as 


W 1 + X/u  + X/ 6 + a/6  ’ 


X/6 


F 1 + X/y  + X/ 6 + a/6  ’ 


= 


X/ y 


D 1 + X/y  + X/ 5 + a/6 


and 


P,  = 


a/6 


C 1 + X/y  + X/ 6 + a/ 6 


It  is  interesting  to  observe  that  when  the  module  is  in  state  F,  it 
has  malfunctioned  but  the  outside  world  does  not  know  about  it.  Thus,  from 
an  external  point  of  view  the  module  is  said  to  be  available  when  it  is 
either  in  state  W or  in  state  F.  In  reality,  the  module  should  be  called 
available  only  when  it  is  in  state  W.  Thus,  we  have  the  real  availability 
Ar  = Pw  and  the  apparent  availability  A = P + P . The  purpose  of  the  on- 
line detector  is  to  keep  Aa  and  Ar  as  close  to  each  other  as  possible. 


Note  that  the  real  availability 


1 + - + 7 + | 1 + X(i- + j)  + a/6 

p 5 0 u o 


, . MTTR  + MTDF  , a 
1 + MT§? — + 0 


Now  since  a is  usually  much  smaller  than  A,  we  may  let 


MTBF 

Ar  = MTBF  + (MTTR  + MTDF) 


Comparing  expression  (4)  with  expression  (3),  we  conclude  that  MTTR  + MTDF 
behave  like  an  "effective"  repair  time.  The  use  of  a more  powerful  detec- 
tor, i.e.,  a smaller  value  of  MTDF,  implies  a reduction  in  the  effective 
repair  time,  which  in  turn  implies  an  increase  in  real  availability.  In 
fact,  for  fixed  values  of  MTBF  and  MTTR,  largest  real  availability  results 
when  we  employ  a detector  with  zero  detection  latency. 

Next  consider  the  probability  of  being  in  the  undesirable  state  F 


A/6 

1 + A/p  + A/5  + a/6 


A/5  (1  - A/p  - A/5  - a/6) 

i (1  - JL/y  - f)  - ii 

5 6 


- ? (1  - A/P  - f) 
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thus  employing  a more  powerful  detector  (i.e.,  increasing  the  value  of  <5  ) 
reduces  Pp  and  hence,  bringing  the  real  availability  Ar  and  the  apparent 
availability  Aa  closer  together.  In  fact,  a detector  with  an  infinite 
detection  rate  (or  zero  detection  latency)  implies  that  Pp  = U and 

Aa  = Ar  In  this  case,  there  is  no  need  to  distinguish  between  the  concepts 
of  real  and  apparent  availabilities. 

Finally,  consider  the  apparent  availability 


Now,  since  Ar  increases  with  an  increase  in  6 (i.e.,  a more  powerful 
detector),  we  conclude  that  the  apparent  availability  reduces  with  an 
increase  6.  Thus,  Aa  and  Ar  approach  each  other  as  6 increases  and 
A,  = A.,  in  the  limit  6-*”  • Further  analysis  suggests  that  the  rate  of 

a i 

decrease  in  Aa  is  very  slow  since 


Thus,  as  a first  order  approximation,  the  apparent  availability  remains  con- 
stant independent  of  the  mean  detection  latency. 

To  fix  our  ideas,  let  A = 10  "^hr,  p = 2/hr,  a = lu  /hr,  and  s = 4/hr. 
In  Figure  5,  we  have  plotted  the  apparent  availability  Aa  and  the  real  avail 
ability  ^r  as  functions  of  MTDF. 
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(REAL  AND  APPARENT  AVAILABILITIES) 


Figure  5.  Apparent  Availabilities  vs  MTDF 


Analysis  of  a Non-Mai ntained  Module 

We  will  now  consider  the  analysis  of  a non-mai ntained  module  M consist 
ing  of  the  functional  unit  U and  the  associated  detector  (or  fault  monitor) 
D.  Let  U and  C,  respectively,  denote  the  time  to  failure  of  the  unit  and 
the  detector.  Let  D be  the  time  to  detect  a fault  in  the  module.  Let  T 
denote  the  time  to  fault  indication.  Note  that  U,  C,  D and  T are  all  ran- 
dom variables  and  T = min  (U  + D,  C).  Let  Ra( t)  and  Rr(t)  denote  the  ap- 
parent and  the  real  reliabilities  of  the  module,  respectively.  We  will 
assume  that  U,  D and  C are  mutually  independent  exponentially  distributed 
random  variables  with  means  1/*,V<5  and  l/«,  respectively.  Then 

Ra(t)  = P(T  > t)  = P(U  + D > t,  C > t) 

= P(U  + D > t)  P (c  > t)  by  independence 
= Rg  + D<t)  Rc(t)  (8) 

By  exponential  assumption, 


Rc(t)  = e at,  RD(t)  = e‘6t,  and  R^t)  = e‘At. 


Therefore, 


Ru+0(t) 


6 -Xt  X -6t 
<5-Xe  ~ 6 - X e 


(9) 


(note  that  this  is  a hypoexponenti al  distribution). 
Then,  from  above,  we  get 

R (t)  = -t— £-y  e"(A+a)t  - e'(6+a)t 

d 0 - A 0 - A. 


(10) 


For  computing  real  reliability  Rr(t),  we  note  that  the  module  ceases 
to  function  properly  when  a fault  occurs  in  either  the  unit  or  the  detector. 


Thus 


Rr(t)  = P{U  > T,  c > t) 

= P(U  > t)  P(c  >t)  by  independence 
* Ru(t)  Rc(t) 

= e (11) 

We  note  that  in  the  absence  of  the  detector,  the  real  reliability  is 
e-xt;  therefore,  employing  a detector  actually  reduces  the  real  reliability. 

Without  a detector,  <*=  0 and  <5  is  near  zero;  therefore,  the  apparent 
reliability  will  be  very  high.  Thus,  the  apparent  reliability  is  also  re- 
duced by  employing  a detector.  The  purpose  of  an  on-line  detector  is  to 
close  the  gap  between  the  values  of  the  real  and  the  apparent  reliabilities. 

We  can  also  compute  the  real  and  apparent  MTTF  (MTTFr  and  MTFFa): 

6 X 

(<S-X)  (x+a)  ( 6-X)  ( 6+a) 

- <5  + X + a 

(X+a)  (6+a) 


and 


MTTFr 


1 

X + O! 


(13) 


We  define  the  detector  effectiveness  to  be  the  ratio  MTTF„/MTTF 

r a 

The  detector  effectiveness  is  plotted  in  Figure  6 as  a function  of  the 
detection  rate  6. 

2. 2. 1.3  Design 

We  now  present  a design  model  for  a non-mai ntai ned  module.  We  are 
asked  to  choose  the  characteristics  of  an  on-line  detector  that  will  mini- 
mize the  total  cost.  The  two  cost  components  that  enter  into  our  model  are 
the  cost  of  the  detector  and  the  cost  (or  penalty)  for  the  time  system 
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spends  in  the  undesirable  state.  For  the  sake  of  simplicity,  we  will 
assume  that  the  detector  has  zero  failure  rate,  i.e.,  a = 0. 

The  percentage  of  time  spent  by  the  module  in  the  undesirable  state  is 
easily  computed  to  be  x/6.  Let  the  per  unit  time  penalty  of  being  in  this 
state  be  given  by  Kp.  Then,  the  penalty  is  given  by  Kp X/6 . To  character- 
ize the  cost  of  the  detector,  we  assume  that  the  unit  U is  an  arithmetic 
unit  and  the  detector  D is  a modulo-m  checker  (see  Figure  7).  The  problem, 
then,  is  to  determine  the  optimum  value  of  m. 

The  cost  of  a modulo-m  checker  may  be  approximated  by  Co  log  m.  Then 
the  total  cost 

C = Kt-X/6  + Co  log  m 

F (14) 

Note  that  x,  Kp  and  Co  are  assumed  to  be  fixed  parameters,  but  6 is 
expected  to  be  a function  of  m. 

A reasonable  functional  relationship  is 

5 = 60  ma  


After  substituting  (15)  in  (14),  we  can  determine  the  optimum  value  of  m by 

takingdc.  and  setting  it  equal  to  zero: 
dm 

a aKcX  .. 

dc  - F + LO  = o 
3m  _a+l  m 


or  m 


opt  ~ |K 


F . Xa 


This  shows  that  the  larger  the  value  of  Kp  relative  to  Co,  the  larger 
should  be  the  value  of  m.  In  other  words,  if  the  penalty  of  being  in  the 
undesirable  state  is  large,  we  should  choose  a more  powerful  detector. 

It  is  hard  to  parameterize  such  a model.  We  obtained  data  from  a 
paper  by  J.  Clary  [5]  and  fitted  the  data  to  obtain  the  values  of  5o  and  a. 
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ERROR 

SIGNAL 


PROBLEM  IS  TO  DETERMINE  THE  CHECK 
MODULUS  M THAT  WILL  MINIMIZE  THE  COST. 


Figure  7.  Design  of  a Non-Maintained  Module 


Using  these  values  and  x = 10  ~5/hr,  we  can  determine  the  optimal  value  of  m 
as  a function  of  the  relative  cost  Kp/Co  from  equation  (16).  This  function 
i s plotted  in  Figure  8. 


2. 2. 1.4  Work  Planned  for  the  Future 

First,  we  plan  to  consolidate  all  the  above  models  of  systems  analysis. 
In  addition  to  the  four-state  model  presented  in  Section  II,  we  will  also 
include  models  with  multiple  fault  types  and  models  with  multiple  on-line 
detectors.  We  will  also  include  models  of  non-repai rable  systems. 

The  next  task  to  be  undertaken  is  the  development  of  system  design 
models  for  a repairable  module.  The  problem  is  to  choose  the  rates  X,  u,  <5 
and  a so  as  to  maximize  real  availability  subject  to  a cost  constraint. 
The  cost  of  the  module  consists  in  the  cost  of  the  unit,  the  .ost  of  the  de- 
tector, the  cost  of  the  repair  facility,  and  the  cost  associated  with  the 
module  state  F.  Recall  that  in  state  F,  the  module  has  failed  but  the  fail- 
ure is  not  yet  detected.  Such  a state  of  the  module  may  be  very  harmful 
and  there  may  be  a heavy  penalty  associated  with  it.  The  main  problem  in 
such  a model  is  how  to  characterize  various  cost  components  as  functions  of 
the  decision  variables.  We  plan  to  spend  a good  deal  of  time  on  this  phase 
of  the  research  and  we  anticipate  very  interesting  and  useful  results. 

Many  extensions  to  the  models  for  analysis  and  the  models  for  design 
are  evident  to  us.  We  may  include  various  types  of  static,  standby  or 
hybrid-redundant  schemes  in  our  models.  We  may  also  remove  the  all  per- 
vasive assumption  of  independence  and  consider  a system  consisting  of  asso- 
ciated components  [3].  Parameterization  and  actual  use  of  these  models  to 
analyze  real  systems  is  also  an  extensive  and  interesting  project. 

The  work  proposed  in  this  section  may  take  two  or  more  years  for  its 
completion.  The  help  of  at  least  one  and  perhaps  two  graduate  student 
assistants  is  desirable  for  this  massive  effort.  We  may  note  that  to  train 
prospective  students  to  work  along  these  lines,  a course  developed  by  this 
author  in  the  Department  of  Computer  Science  at  Duke  University  is  very 
helpful.  This  course  is  titled,  "Probability  Theory  and  Applications  to 
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Computer  Science  and  Electrical  Engineering."  A good  part  of  the  course  is 
devoted  to  stochastic  models  of  system  reliability  and  availability. 
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2.2.2  BIT  FACILITY  IDENTIFICATION  AND  EVALUATION 

By 

Dr.  Peter  N.  Marinos 
Professor  of  Electrical  Engineering 
and  Computer  Science 
Duke  University,  Durham,  N.C. 

ABSTRACT 

This  part  of  the  report  represents  two  major  undertakings.  First,  it 
describes  a unique,  programmable  cellular  structure  capable  of  realizing  any 
arbitrary  sequential  machine;  and  second,  it  presents  the  design  and  detail 
description  of  a high-level  digital  computer  simulator  with  fault  injection 
faci 1 i ties. 

The  motivation  for  developing  this  programmable  cellular  structure  was 
the  importance  of  modular  design  in  achieving  improvements  in  digital  system 
reliability  and  availability  while  retaining  system  flexibility.  The  pro- 
posed basic  cell  is  so  configured  that  it  makes  the  design  of  Built-In-Test 
(BIT)  facilities  a natural  extension  of  the  overall  system  design  process. 
The  approach  taken  here  in  arriving  at  a testable  cellular  structure  is  re- 
ferred to  as  "hardware  encoding"  to  distinguish  it  from  more  traditional  in- 
formation encoding  schemes.  The  "hardware  encoding"  of  the  cellular  struc- 
ture relies  on  the  same  basic  cell  used  to  configure  the  rest  of  the  cellu- 
lar structure  and  may  be  thought  of  as  the  hardware  analog  of  well-known 
information  encoding  procedures. 

The  high-level  digital  computer  simulator  with  fault  injection  facili- 
ties, which  was  developed  as  a Master's  thesis  in  Computer  Science  at  Duke 
University  under  the  supervision  of  Professor  P.  N.  Marinos,  represents  a 
very  useful  tool  in  evaluating  the  effectiveness  of  various  BIT  facilities. 
The  unique  feature  of  this  simulator  is  its  ability  to  combine  the  function- 
al flexibility  of  a simulated  hardware  organization  with  the  ability  to  pro- 
cess a typical  work  load  on  the  simulated  machine  subject  to  a specified 
fault  environment. 
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2. 2. 2.1  Background 

The  reliability  and  availability  of  digital  systems  can  be  increased 
by  the  use  of  built-in-test  (BIT)  facilities  capable  of  detecting  all  sys- 
tem faults  of  a specified  class.  Desirable  properties  to  be  possessed  by 
such  BIT  facilities  are: 


1.  Self-checkability, 

2.  General  applicability, 

3.  Fault  resolvability  to  a specified  modular  level, 

4.  Suitability  for  use  with  current  technologies  (i.e.,  LSI), 

5.  Fault  management  and  fault  reporting  capability  for  the  purpose  of 
effecting  system  recovery  (i.e.,  system  repair  and/or  system  recon- 
figuration), and 

6.  Passive  (i.e.,  non-interferi ng) , continuous  system  monitoring  capa- 
Dility,  at  least  until  a fault  has  been  detected,  at  which  time  the 
BIT  facility  may  become  active  and  participate  in  system  repairs. 

The  objectives  of  our  task  are  specifically: 

1.  The  development  of  BIT  facilities  for  use  at  the  system  and  sub- 
system level  with  special  emphasis  on  the  implementation  and  dis- 
tribution of  such  facilities  throughout  the  entire  system,  and 

2.  The  evaluation  of  the  effectiveness  of  various  BIT  facilities  in 
terms  of  added  cost  and  complexity  required  for  their  implementa- 
tion, as  well  as  in  terms  of  the  level  of  system  protection  they 
provide. 


Work  performed  during  the  period  September  1,  1977  to  May  15,  1978  has 
considered  both  objectives  outlined  above,  and  it  is  presented  in  the  sequel 
as  Part-A  and  Part-B.  Part-A  describes  a programmable  cellular  structure 
with  "hardware  encoded"  BIT  facilities  while  Part-B  presents  a high-level 
digital  computer  simulator  with  fault  injection  facilities. 
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2. 2. 2. 2 Programmable  Cellular  Structures  With  Hardware  Encoded  BIT 
Facilities 


Introduction 

The  advantages  of  cellular  arrays,  which  result  from  the  regularity  of 
their  iterative  structure,  have  been  sufficiently  documented  [1-7].  Memory 
manufacturers  have  fully  exploited  many  unique  features  of  cellular  arrays, 
and  the  presently  realizable  memories  with  highly  improved  device-densities 
production  yields,  size,  cost,  speed,  noise,  and  reliability  constitute  ex- 
cellent testimony  of  their  success. 

Since  the  primary  motivation  for  integrated  circuits  (IC’s)  is  the  re- 
duction of  interconnections  and  packages,  which  in  turn  translates  into  a 
number  of  improvements  in  terms  of  cost,  power,  speed,  size,  noise,  and 
reliability,  it  is  quite  natural  for  IC  manufacturers  to  look  into  cellular 
structures  as  the  next  natural  step  towards  bringing  about  further  improve- 
ments in  system  integration,  system  reliability,  and  system  testability  and 
maintainability.  This  implies  incorporation  of  cellular  logic  design 
notions  into  the  design  of  what  has  been  traditionally  known  as  random 
logic  subsystems  such  as  control  units  and  arithmetic  and  logic  units  of 
digital  systems. 

There  are  three  main  reasons  why  random  logic  subsystems  have  remained 
largely  non-cellular  in  form,  namely:  lack  of  design  standardization; 
inadequate  testing  procedures  for  fault  detection  and  fault  diagnosis;  and 
poor  maintenance  schemes  in  terms  of  self-reconfiguration  in  the  event  of 
failure,  as  well  as  in  terms  of  preventive  maintenance.  These  are  major 
problem  areas  and  are  currently  attracting  a great  deal  of  attention  in  sev 
eral  industrial  and  university  laboratories.  This  interest  is  easily 
justified  by  the  fact  that  cost,  device  densities  and  yields  of  mass-prod- 
uced cellular  structures  using  mature  technologies  have  greatly  improved 
over  the  years,  and  we  find  ourselves  now  in  the  comfort>ab-le  position  of 
being  able  to  build  very  economically  into  such  cellular  structures,  redun- 
dant functional  capability,  which  one  could  use  to  improve  system  inte- 
gration, system  testability  and  maintainability,  and,  in  general,  system 
reliability  and  availability. 
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In  recent  publications.  Manning  [6]  describes  certain  procedures  for 
automatic  testing,  configuration,  and  repair  of  cellular  arrays;  and  Page 
and  Marinos  [7]  propose  a programmable  array  for  use  in  designing  synchro- 
nous sequential  machines,  and  demonstrate  ways  for  actually  embedding  arbi- 
trary finite-state  machines  in  such  arrays.  What  is  proposed  next  is  a na- 
tural extension  of  these  two  independent  research  efforts,  and  it  is  based 
on  the  strong  belief  that  industry  will  recognize  the  many  advantages  of  us 
ing  cellular  structures  throughout  a digital  system  once  the  issues  of  ar- 
ray standardization  (both  functional  as  well  as  structural),  testability 
and  maintainability  have  been  resolved  in  a practical  and  cost-effective 
manner 

Cellular  Arrays  Utilizing  K-Out-Of-M  State  Assignments 

The  cellular  array  envisioned  in  this  study  is  comprised  of  programma- 
ble logic  cells  capable  of  supporting  all  the  combinational  logic  needs  of 
a system,  and  of  memory  cells  either  in  a physically  separate  cellular  array 
or  as  part  of  the  overall  programmable  cellular  structure.  The  separation 
of  the  combinational  logic  from  the  memory  unit  arises  very  naturally  in 
sequential  machine  design  and  it  is  additionally  justified  by  the  fact  that 
testing  procedures  for  memories  are  distinctly  different  from  those  used  for 
testing  the  non-memory  portion  of  a system.  For  these  reasons,  and  the  fact 
that  memories  in  cellular  form  will  always  be  available,  independently  of 
how  one  decides  to  implement  the  random  logic  of  control  units  and 
arithmetic-logic  units,  we  chose  to  maintain  this  separation  and  not  to 
distribute  memory  among  the  cells  of  the  programmable  cellular  structure. 

The  proposed  cellular  array  represents  a basic  logic  structure  one  can 
successfully  utilize  to  realize,  in  a programmatic  way,  any  arbitrary  se- 
quential machine.  The  design  approach  made  possible  by  the  proposed  array 
is  highly  modular  and  offers  many  opportunities  for  achieving  improvements 
in  digital  system  reliability  and  availability  while  retaining  system  flex- 
ibility. The  incorporation  of  Built-In-Test  (BIT)  facilities  in  systems 
configured  from  the  proposed  cellular  arrays  is  a natural  extension  of  the 
overall  system  design  process  requiring  no  special  hardware  considerations. 


68 


The  main  objective  of  this  study  is  to  develop  and  distribute  the  BIT 
facilities  over  the  cellular  structure  of  systems  based  on  the  proposed 
cellular  array  and  to  evaluate  the  effectiveness  of  such  facilities  in  terms 
of  added  system  cost,  system  reliability,  system  availability,  and  system 
maintainability.  The  design  of  the  basic  cell  utilized  in  the  cellular 
array  has  been  motivated  by  the  requirements  and  properties  of  the  BIT  fa- 
cilities which  were  outlined  earlier. 

Characterization  of  Synchronous  Sequential  Machines:  A finite-state 
synchronous  sequential  machine  is  described  by  the  algebraic  structure 

M = < X,  Z,  Q,  6 j , w>  , where 

X = a finite  set  of  input  symbols  (x  X2»  . . . > xn)  such  that 
xi  e (0,1),  i * 1,  2,  ....  n; 

Z = a finite  set  of  output  symbols  ( z ^ ...  , z^)  such  that 


zj  e (0,1),  j = 1,  2, 


p; 


Q = a finite  set  of  states  (qj_,  q2»  . . . , qt)  defined  by  the 
state  variables  (yj_,  y£,  . . .,  ym)  such  that  m>  log  t; 

6 : X x Q into  Q iS  the  next-state  function 


0)  : X X Q ont°  Z 
or  Q ont9  Z 


is  the  output  function 


The  general  form  of  the  excitation  and  output  functions  of  a sequential 
machine  may  be  written  as  follows: 


Fr(xi»  *2’  ’ ' ■»  V yi*  y2>  • • ..  yj  « 
t 

^i  r (X1‘  x2»  • . x )-q. 
i=l  1 »r  n'  Mi 


(1) 
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where 


r = 1,  2, 


m + p 


9i  = ^1  *2  • • • ym»  with  Yj  denoting 

the  state  variable  yj  either  in  its 
complemented  or  uncomplemented  form  (i.e., 
qi  denotes  a min-term  of  the  state  vari- 
ables yj,  j = l,  2,  . . m.) 

For  reasons  outlined  elsewhere  [7,8],  state  assignments  based  on  mono- 
tone, k-out-of-m  codes  will  be  utilized.  Among  the  many  advantages  offered 
by  these  codes,  the  one  of  interest  in  this  case  is  their  utility  in  error 
detection,  and  in  designing  fail-safe  sequential  machines  [9,10].  In  view 
of  such  a state  assignment,  one  may  rewrite  equation  (1)  in  the  form, 

t 

u ,r  (xi>  x2>  • • • » xr)  • G..  (m,k)  (2) 

i=l 


where 


G,(m,k)  = y.  y.  . . . y, 
i Tl  ’2  \ 

and  y . e(yi>  Yz*  • • •»  y^)»  J - 2,  . . .,  k. 

j 


Equation  (2)  suggests  a linear  array,  each  cell  of  which  is  algebraically 
described  by  the  function 

Ai  ,r  S ,r  ~ ^i-i  ,r  + S' ,r  ^Xl  ’ x2  ’ * ' ’ ’ xn  ^ ^ 

Figure  1 shows  the  structure  of  such  a cell  with  an  n- input  programmable 
universal  logic  module  (PULM-n)  implementing  the  function  fi,r  (x],X2»  • • -»xn) 
The  PULM-n  unit  is  programmed  via  the  associated  programming  register.  The 
linear  cellular  array  shown  in  Figure  2 implements  the  function  Fr  given 
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by  expression  (2),  and  the  two-dimensional  cellular  structure  given  in 
Figure  3 illustrates  a cellular  array  capable  of  implementing  any  required, 
finite  number  of  excitation  and  output  functions  Fr  necessary  for  the  impl- 
ementation of  a finite  state,  synchronous  sequential  machine.  One  of  the 
structural  advantages  of  the  array  shown  in  Figure  3 is  its  ability  to  sup- 
port finite  state,  synchronous  sequential  machines  of  arbitrary  "state- 
space"  cardinality  without  requiring  any  structural  changes  in  the  basic 
cell  of  the  array. 

Description  of  Basic  Cell:  The  major  module  of  the  basic  cell  shown 
in  Figure  1 is  the  so-called  programmable  universal  logic  module  (PULM-n) 
capable  of  realizing  any  switching  function  of  n binary  variables.  The  pro- 
grammability of  this  module  is  made  possible  through  an  appropriate  field 
in  the  programming  register  while  the  n binary  variables  used  by  the  module 
to  form  the  desired  n-variable  switching  function  are  provided  via  an  input 
selector  and  accessed  through  the  n-BUS  under  the  control  of  the  programming 
register.  Finally,  a third  field  in  the  programming  register  is  used  to 
select  the  state,  qi , associated  with  the  cell  in  question. 

The  Input-Selector  and  State-Selector  units  receive  primary  input  and 
secondary  input  (or  state)  information  via  the  X-BUS  and  Y-BUS,  respective- 
ly. These  two  busses  may  be  arbitrarily  large  in  size  while  the  q-BUS  and 
n-BUS,  referred  to  earlier,  are,  for  practical  considerations,  of  signifi- 
cantly smaller  size.  In  the  case  of  the  n-BUS,  this  is  motivated  by  the 
growth  in  size  of  the  PULM-n  unit  as  n becomes  larger;  with  respect  to  the 
q-BUS,  one  is  interested  in  satisfying  the  state- space  of  a machine  with  the 
minimum  number  of  state  variables.  For  K-out-of-m  codes  used  here,  the 
choice  of  K,  which  denotes  the  size  of  the  q-BUS,  must  be  such  that  it  pro- 
duces the  largest  possible  number  of  states  from  the  m state  variables 
brought  in  via  the  Y-BUS.  Thus,  the  usual  choice  for  K is  such  that  the 

expression  , . is  maximized.  It  should  be  noted  that  with  the  X-BUS 

k : k-m ) : 

there  is  an  extra  line  known  as  the  "program  mode"  line  used  for  controlling 
the  programming  register,  and  its  specific  function  will  be  revealed 
shortly. 
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A unique  feature  of  the  proposed  basic  cell  is  the  dual  function  served 
by  the  X-  and  Y-BUSSES.  In  addition  to  their  main  function  of  importing 
primary  and  state  information  to  the  cell,  they  also  serve  two  additional 
functions:  The  X-Buss  is  used  to  program  the  programming  register,  provided 
the  cell  in  question  has  been  properly  addressed  via  the  Y-BUS  to  generate 
the  required  "program  enable"  control  signal,  which  is  essential  in 
reprogramming  the  PULM-n  module.  Thus,  the  second  function  served  by  the 
Y-BUS  is  to  address  uniquely  the  cell  of  interest  by  making  use  of  a unique 
"cell  address  decoder."  The  dual  functions  served  by  the  X-  and  Y-BUSSES 
constitute  a very  important  design  consideration  since  they  help  maintain  a 
low  pin-count  for  the  cell.  In  the  "program  mode"  the  cell  output  is 
disregarded. 

Another  important  feature  of  the  cell  in  Figure  1 is  the  fact  that  all 
the  incoming  information  is  reduced  or  "fully  decoded"  to  a single-line  out- 
put which  is  very  important  from  the  standpoint  of  pin-count. 

Description  of  Cellular  Array:  An  aggregate  of  basic  cells  arranged 
as  shown  in  Figure  2 forms  a cellular  array.  Each  cell  has  full  access  to 
the  X-  and  Y-BUS,  and  function  realization  is  effected  along  a column.  The 
upper  boundaries  of  the  array  are  set  to  zero,  and  the  realized  function 
along  each  column  is  output  as  F^  and  represents  either  an  excitation 
or  output  function. 

Each  cell  in  a column  is  programmed  to  account  for  a specific  term  of 
the  function  which  is  implemented  in  a sum-of-products  form.  For  functions 
which  are  independent  of  state  information,  the  "state  selector"  in  each 
cell  is  capable  of  providing  the  Boolean  constant  1 under  control  from  the 
appropriate  field  of  the  programming  register,  thus  facilitating  the  genera- 
tion of  product  terms  not  requiring  state  information. 

Using  the  algorithmic  procedure  developed  by  Page  and  Marinos  [7],  one 
may  now  embed  any  arbitrary,  synchronous  sequential  machine  in  the  cellular 
array  of  Figure  2.  The  cellular  array  described  in  this  study  utilizes  a 
basic  cell  which  is  functionally  more  complex  than  the  one  used  by  Page  and 
Marinos  [7],  and,  as  a result,  it  leads  to  more  flexible  cellular  struc- 
tures. It  should  be  noted  that  the  proposed  cellular  array  permits  local- 
ized selection  of  both  state  variables  and  primary  input  variables  and 
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facilitates  its  own  programmabi  1 i ty  via  the  primary  and  secondary  input 
busses  X and  Y.  Each  cell  in  the  array  is  individually  addressed,  and  the 
array  is  assumed  to  obtain  its  state  information  from  a separate  memory 
array. 

Cellular  Array  With  Hardware  Encoded  BIT  Facilities 

The  term  "hardware  encoding"  is  used  to  mean  "the  process  of  mapping 
in  a systematic  way  any  of  the  well-known  information  encoding  schemes  onto 
a cellular  array  structure." 

The  manner  in  which  such  a mapping  is  carried  out  is  greatly  simplified 
by  assuming  that  all  the  cells  of  a row  are  programmed  to  receive  the  same 
state  code.  This  is  only  a topological  and  not  a functional  restriction  im 
posed  on  the  machine  which  is  embedded  in  the  cellular  array  and  it  is  made 
for  the  sake  of  analytical  convenience.  Thus,  the  mapping  of  a sequential 
machine  into  a cellular  array,  using  the  algorithm  by  Page  and  Marinos  [7], 
results  in  a cellular  machine  structure  in  which  each  row  is  uniquely 
associated  with  a machine  state,  except  in  the  case  of  state  splitting  to 
be  discussed  later. 

Figure  3 illustrates  a cellular  array  in  which  the  i^h  row  is  associated 
with  machine  state  q^ . The  busses  and  other  details  of  the  cellular  array 
are  omitted  for  the  sake  of  clarity.  The  four  leftmost  columns  of  the 
array  are  used  to  implement  machine  functions  ( i . e . , excitation  and  output 
functions)  while  the  remaining  three  ( i . e . , Fc-j  , FC2  and  Fc3)  denote  the 
built-in  check  functions  employed,  in  this  case,  to  encode  the  information 
carrying  functions  F^ , F2,  F3  and  F4  according  to  the  well-known  single-error 
correcting  Hamming  code.  These  checking  functions  are  implemented  in  the 
last  three  columns  of  the  array  in  a manner  similar  to  that  used  to  form 
any  other  machine  function.  The  resulting  array  is  one  with  "hardware  en- 
coded" BIT  facilities  that  make  continuous  and  non- interfering  monitoring 
of  the  encoded  output  ( i . e . , F-j  F3  F3  F^  Fc-|  F ^ Fc3  ),  using  standard 
error  checking  schemes,  possible. 

Although  the  illustration  in  Figure  3 uses  a single-error-correcting 
Hamming  code,  other  encoding  schemes  may  be  just  as  easily  employed  by  util- 
’l^nq  the  appropriate  relations  which  must  hold  between  the  "information 
*'i  function  f^r's  and  the  "checking  cell"  functions  fj|Cj's. 


76 


For  the  single-error-correcting  Hamming  code  used  in  Figure  3,  the 
well-known  relationships  are  given  by  the  equations 


fi,c2  fi,i  ® fi,3  ® fi,4 


fi,c3  fi  ,2  ® fi  ,3  ® fi  ,i 


assuming,  of  course,  even  parity.  The  above  equations  prescribe  the  respec- 
tive "checking  cell"  functions  that  must  be  programmed  so  that  when  the 
machine  is  in  state  qr,  the  rth  row  of  the  array  is  "hardware  en- 
coded" in  accordance  with  the  coding  scheme  of  interest. 

In  the  case  of  a combinational  (i.e.,  memoryless)  network,  the  check- 
ing functions  are  realized  in  a manner  similar  to  the  sequential  network 
case  by  assuming  the  network  as  being  a one-state  machine.  Whenever  the 
functional  complexity  of  a machine  exceeds  the  resources  of  a row  in  a 
cellular  array,  then  more  than  one  row  may  be  associated  with  a single 
state  resulting  in  what  we  have  previously  referred  to  as  state  splitting. 

The  approach  outlined  above  for  realizing  BIT  facilities  is  compatible 
with  LSI  technologies,  and  it  makes  full  utilization  of  well-known  infor- 
mation encoding  schemes  used  for  error  detection  and  diagnosis  in  digital 
systems.  It  is  worth  noting  that  "hardware  encoded"  BIT  facilities,  as 
proposed  here,  do  not  require  specially  designed  hardware,  and  they  are 
incorporated  into  the  overall  system  in  a way  that  makes  them  a natural 


extension  of  the  non-encoded  machine  structure. 


There  are  many  unresolved  questions  concerning  optimal  machine  layout 
and  mapping  onto  a cellular  array;  similarly,  there  are  problems  with 
respect  to  encoding  schemes  suitable  for  various  machine  layouts.  These 
and  other  issues,  such  as  distribution  of  spare  cells  over  a cellular  array 
and  maintenance  policies,  which  impact  the  cost-effectiveness  and 
reliability  of  such  arrays,  are  the  object  of  our  continuing  research 
efforts  in  this  area. 
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This  work  is  based  upon  the  premise  that  the  addition  of  Built-In- 
Test  (BIT)  hardware  and  software  for  the  handling  of  undesired  events 
(UEs)  may  increase  reliability  in  modular  computer  systems,  i.e.,  increase 
the  likelihood  that  the  computer  system  will  help  us  when  we  need  its 
help.  Currently,  many  software  engineers  espouse  a "zero  defects"  philos- 
ophy, one  that  expects  software  to  be  correct.  Such  a philosophy  is  im- 
practical; even  though  program  verification  and  structured  programming  are 
useful  tools,  UEs  will  almost  invariably  occur.  Our  proposed  approach  to 
system  design  has  three  main  components:  anticipati on  of  the  occurrence 
of  possible  UEs;  detection  of  occurring  UEs;  response  to  the  UEs  (correc- 
tion where  possible).  Such  an  approach  should  be  useful  in  the  develop- 
ment of  reliable  modular  computer  systems. 

The  specific  project  subtask  is  to  examine  the  hardware/software 
interface  requirements  in  modular  computer  systems  that  support  the 
handling  of  UEs  by  software  and  to  provide  generally  useful  guidelines  for 
the  specification  of  such  interfaces.  In  addition  to  the  previous  work 
that  Dr.  D.  L.  Parnas  has  done  on  this  subject,  other  research  has  been 
done  primarily  in  Europe.  As  a result,  the  most  comprehensive  research 
report  known  to  us  was  written  in  German.  Our  earliest  efforts  were 
directed,  therefore,  to  making  this  literature  available  in  English. 
(Reports  of  previous  research  concerning  BIT  were  provided  by  the  Research 
Triangle  Institute.) 

The  most  relevant  sections  of  the  research  report,  Reakti on  auf 
Unerwuenschte , Erei qni sse  i n Hierarchi sch  Strukturi  erten  Software- 
Systemen  (Reaction  to  Undesired  Events  in  Hierarchically  Structured  Soft- 
ware Systems),  by  Dr.  H.  Wuerges,  have  been  translated  into  English  (refer 
to  Section  2.3.1).  Wuerges,  who  worked  closely  with  Dr.  D.  L.  Parnas, 
describes  the  methods  for  handling  UEs,  paying  particular  attention  to 
hardware/software  interfaces  and  cost  factors.  Of  major  interest  are  his 
suggestions  for  hardware/software  interfaces  (reference  Chapter  12  of 
Wuerges'  thesis).  Wuerges1  experimental  work  was  based  on  a Siemans  4004 
architecture.  In  his  thesis,  he  demonstrates  how  hardware  design  errors 
presented  effective  software  recovery. 

The  fundamental  issues  that  are  addressed  in  the  research  are:  the 
redundancy  of  code  and  of  information;  distinction  between  functions  best 
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performed  by  the  hardware  and  those  best  performed  by  the  software;  the 
specification  of  the  hardware/software  interface.  For  example,  the  re- 
dundancy of  data  in  a system  in  which  the  frequency  of  system  failure  is 
high  may  be  considerably  more  economical  than  the  costs  incurred  by  the 
loss  of  the  data. 

We  have  modified  and/or  expanded  many  of  the  ideas  contained  in 
Wuerges*  thesis.  For  example,  anticipation  of  the  occurrence  of  possible 
UEs  plays  a major  role  in  our  philosophy  of  system  design  --  one  can  not 
handle  a UE  whose  occurrence  has  not  been  anticipated.  Wuerges  provided  a 
classification  scheme  for  possible  UEs  that  was  useful  for  his  purposes, 
but  which  was  rather  limited  in  scope;  i.e.,  the  classification  is 
peculiar  to  the  machine  on  which  he  conducted  his  experimental  work.  A 
more  general  handling  of  the  topic  required  a precise  specification  of  the 
possible  UEs.  Such  a classification  has  been  developed  in  order  to  pro- 
vide a comprehensive  base  for  the  work  being  undertaken. 

Three  classes  of  fault  sources  are  distinguished:  the  hardware,  the 
software  and  external  sources.  Among  hardware  defects  are  defective  elec- 
tronic module,  defective  I/O  transducer,  defective  interconnection,  and 
crosstalk  and  noise.  The  definitions  of  the  first  three  depend  upon  the 
partitioning  of  the  system.  Software  faults  include  incomplete  coding  and 
violation  of  the  applicability  conditions.  Power  source,  defective  off- 
line storage  media,  operator  (including  system  configuring  and  control) 
and  input  data  are  external  fault  sources.  As  important  as  the  fault 
sources  are  the  ways  in  which  faults  manifest  themselves.  As  an  example, 
the  failure  of  a bit  in  a location  where  a program  segment  is  stored  may 
manifest  itself  as  a program  error,  i.e.,  improper  or  undefined  operation 
or  operand  code.  Possible  fault  manifestations  are  given  in  Table  1. 

Preparation  for  UEs  is  a rather  useless  action  without  the  ability  to 
detect  them.  Wuerges  assumed  no  BIT  --  our  research  is  based  on  a system 
which  includes  BIT  hardware  and  software  modules  for  the  detection  of  UEs 
and  for  the  provision  of  information  concerning  occurring  UEs.  Detection 
of  a failure  (by  the  BIT  hardware  module)  generates  a high-priority  inter- 
rupt to  the  functional  hardware  and  causes  a transfer  of  control  to  the 
BIT  software  module.  (Refer  to  Figure  1 for  characterization  of  the  rela- 
tionships between  the  various  hardware  and  software  modules). 
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Table  1 


Fault  Mani festati ons 


I . HARDWARE 

A.  INCORRECT  MODULE  OPERATION 

B.  INCORRECT  I/O  TRANSDUCER  OPERATION 

C.  FAULTY  DATA  TRANSFER  BETWEEN  MODULES 

D.  FAULTY  DATA  TRANSFER  FROM  I/O  TRANSDUCERS 

E.  UNRESPONSIVE  I/O  PORT 

F.  UNREASONABLE  OR  INCORRECT  REGISTER*  CONTENTS 

G.  UNREASONABLE  VOLTAGE  OR  CURRENT  LEVELS 

II.  SOFTWARE 

A.  UNREASONABLE  OR  INCORRECT  REGISTER**  CONTENTS 

B.  INCORRECT  INPUT  DATA  ON  FAULTY  INPUT  DATA  TRANSFER 

III.  EXTERNAL 

A.  PATHOLOGICAL  DISPLAY 

B.  PATHOLOGICAL  OUTPUT  DATA 

C . SYSTEM  CRASH 


* AU  REGISTERS,  INCLUDING  MEMORY 

**  SOFTWARE  - ACCESSIBLE  REGISTERS  ACCUMULATOR,  INDEX  REGISTERS,  STACK 
POINTER,  ETC. 
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Programmable  Computer  With  Undesired  Event  Software  and  Built-In-Test  Hardware 


Response  to  detected  UEs  is  the  third  facet  of  our  approach  to  the 
design  of  reliable  modular  computer  systems.  Hardware-detected  UEs  will 
be  reported  by  means  of  traps,  as  described  in  Wuerges'  thesis.  Similar 
UEs  would  be  handled  by  a single  parameteri zed  routine;  the  number  of  such 
routines  is  dependent  upon  the  explicit  classification  of  hardware- 
detected  UEs,  the  system  hardware  and  costs.  Special  care  will  be  taken 
to  preserve  the  integrity  of  the  various  modules,  as  described  by  the 
“information-hiding"  principle  of  Parnas  (reference  CACM,  December  1972). 
Initially,  basic  information  will  be  provided  about  as  many  UEs  as  it 
would  be  possible  for  us  to  detect  and  "handle";  i.e.,  checks  for  many  UEs 
are  provided.  As  outlined  in  Wuerges'  thesis,  the  separation  of  code  for 
the  normal  case  and  for  the  UE  case  are  strictly  maintained;  this  allows 
easy  and  independent  modification  of  either.  Checks  for  certain  UEs  will 
be  removed  when  experience  indicates  that  these  UEs  occur  only  infrequent- 
ly. This  will  reduce  the  additional  costs  incurred  by  the  inclusion  of 
UE-handling  tools.  After  initial  system  testing  is  performed,  the  UE- 
handling  routines  can  be  modified  to  allow  desired  capabilities. 

Part  of  the  work  is  devoted  to  the  determination  of  the  best  means 
of  providing  necessary  functions,  i.e.,  whether  by  the  hardware  or  by 
software.  The  basic  premise  of  UE-handling  is  that  a UE  is  best  "handled" 
at  a level  in  which  it  is  caused.  If  a UE  is  detected  at  a lower  level 
(here  the  hardware),  UE-handling  is  attempted  at  each  higher  level  in  the 
hierarchy.  This  requires  reconstruction  of  the  state  of  the  program  at 
each  level.  If  no  level  has  sufficient  information  to  respond  to  the  UE, 
termination  results  at  the  level  of  the  ultimate  user.  Of  major  concern 
is  the  provision  of  additional  registers  and/or  reserved  memory  locations, 
additional  software  (including  the  possibility  of  software  to  perform  some 
hardware  functions)  and  modified  or  additional  instructions.  Such  deci- 
sions, of  course,  depend  upon  performance-cost  trade-off.  Particularly 
noteworthy  is  the  addition  of  three  instructions  described  by  Wuerges: 

1.  CONTINUE  - If  recovery  from  a UE  is  possible,  then  continue 
execution  at  the  instruction  immediately  following 
the  action  in  which  the  UE  occurred. 
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2.  RETRY 


- Attempt  to  repeat  the  action  in  which  the  UE  oc- 
curred, if  the  occurrence  of  the  UE  is  not  "fatal." 

3.  CLEAR  - The  user  does  not  wish  to  continue  the  interrupted 
operation,  in  which  case  the  effects  of  the  operation 
must  be  removed. 

These  are  some  of  the  basic  tools  needed  to  allow  for  recovery  from  UEs  in 
computer  systems. 

We  have  examined  the  specifications  for  the  hardware/software  inter- 
face and  propose  to  specify  the  hardware/software  interface  for  a simple 
modular  system,  following  the  guidelines  proposed  by  Wuerges  and  expanding 
them  as  described  above.  The  basic  model  consists  of  an  Intel  808U  CPU,  a 
read-only  memory,  a random-access  memory  and  a BIT  status  module  (refer  to 
Figure  2).  The  model  is  particularly  appropriate  for  our  needs,  since  al- 
most no  error-handling  capabilities  are  provided  (only  an  external  inter- 
rupt). In  specifying  the  hardware/software  interface,  software  for  UE- 
handling  may  be  introduced  in  some  programs,  yet  programs  that  would  have 
run  on  the  system  without  this  feature  may  still  be  run. 

A simulator  that  implements  a subset  of  the  Intel  8U80A  instruction 
set  has  been  provided  by  Bowles.  A modification  of  this  simulator,  in 
addition  to  simulations  of  other  components  of  the  system,  may  be  used  for 
the  purposes  of  system  testing.  Proposed  changes  to  the  Intel  8080A  in- 
clude the  addition  of:  traps  (refer  to  Wuerges'  thesis);  the  commands 
CONTINUE,  RETRY,  CLEAR;  hardware-implemented  BIT.  The  simulation  of  the 
entire  system  and  system  testing  had  been  scheduled  tentatively  for  the 
summer. 

The  primary  goal  of  this  endeavor  is  total  system  reliability,  i.e., 
the  development  of  systems  in  which  undesired  events  can  be  recognized 
quickly  and  efficiently.  Wuerges'  thesis  is  the  only  work  known  to  us 
that  prescribes  guidelines  for  building  systems  whose  entire  system  design 
is  predicated  on  a systematic  approach  to  UE-handling  throughout  all 
system  components.  No  military  or  commercial  systems  presently  take  this 
approach.  We  intend  to  experimentally  evaluate  the  usefulness  of  this 
approach,  using  the  modular  computer  system  described  above.  The  concepts 
described  in  Wuerges'  thesis  and  the  description  of  his  experiences  on  a 


gure  2.  Modular  DiqitaV  Computer  w/Butl t-In-Test  (BIT) 


Siemens  machine  have  provided  the  oasis  for  this  research.  Past  experi- 
ence has  shown  that  the  addition  of  UE-handling  tools  provides  results 
(with  respect  to  system  reliability)  that  are  comparable  to  or  better  than 
relatively  similar  effort  expended  in  some  other  manner.  We  expect  to 
provide  a simulation  of  the  system  described  above,  test  results,  and  gen- 
erally useful  guidelines  for  designing  reliable  systems,  showing  their  im- 
plementation by  means  of  the  model  system. 
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CHAPTER  3 


REQUIREMENTS  FOR  THE  UE-HANDLING 


Structural  Aspects 

In  most  programming  and  system  implementation  languages,  error- 
handling (if  at  all  practical)  is  possible  only  by  incurring  undesirable 
program  complexity.  The  presence  of  all  possible  UEs  must  be  checked  in 
the  program  and  reaction  to  these  UEs  must  result  in  normal  program  termi- 
nation. If  the  number  of  possible  UEs  is  very  large,  as,  for  example,  in 

the  case  of  I/O  devices  or  address  translation  (virtual  to  real),  then  very 

complex  and  unclear  ( i . e . , hard  to  decipher)  programs  result.  This  means 

that  changes  in  the  normal  code  or  in  the  UE-handling  are  hardly  possible. 

At  least  in  principle,  making  changes  essentially  increases  the  danger  of 
introducing  new  errors.  Since  error-handling  of  the  problems  here  is 
already  difficult  enough,  an  easy  surveyability  must  be  guaranteed;  other- 
wise, there  exists  the  danger  that  error-handling  introduces  more  errors 
than  are  treated  ( i . e. , remedied  or  bypassed). 

A separation  between  code  for  the  normal  case  and  code  for  the  error 
case  provides  for  the  ability  to  oversee  both  parts  and  allows  independent 
changing  of  both  parts.  Since  experience  in  dealing  with  UEs  has  caused 
UE-handling  to  become  more  comprehensive,  this  has  a special  significance. 

An  approach  to  UE-handling  should  support  this  separation  and  thereby 
reduce  the  complexity  of  the  entire  program. 

A second  important  point  concerns  the  responsibility  for  UE-handling. 

The  reaction  to  a UE  in  a program  depends  on  the  objectives  of  this  program 
and  on  the  effects  that  the  occurring  UE  has  on  these  objectives.  There- 
fore, each  group  of  programmers  that  writes  a program  for  a particular 
abstract  machine  should  prepare  code  for  the  case  where  their  program  has 
errors  or  where  the  abstract  machine  that  is  used  is  unable  for  some  reason 
to  execute  this  program.  Only  these  programmers  know  what  their  program 
was  intended  to  do  and,  therefore,  what  actions  are  possible  and  sensible 
for  handling  this  UE.  Other  programmers  know  either  nothing  at  all  about 
this  program  or  they  know  only  its  specifications,  and  not  its  effects. 

They  can  also,  therefore,  not  determine  what  actions  should  be  chosen. 
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Programs  for  managing  a virtual  memory,  for  example,  do  not  know  what  is  in 
a segment  and  for  what  purpose  this  segment  is  used.  Whether  the  content 
of  the  segment  can  be  redetermined,  or  what  effects  the  loss  of  a segment 
has,  is  known  only  to  the  user  who  produced  this  segment.  In  a system  in 
which  the  user  himself  cannot  react  to  UEs  in  his  program,  there  remains 
only  the  possibility  of  termination  of  the  program. 

The  programmers  of  one  level  are  also  most  easily  in  a position  to 
provide  information  to  higher  levels,  when  they  themselves  are  not  respon- 
sible for  an  occurring  LIE  or  when  they  see  no  possibility  of  a reaction  to 
it. 

Both  of  the  user's  programs,  that  for  the  normal  case  (the  desired 
case)  as  well  as  for  the  UE-handling,  should  use  the  same  abstract  opera- 
tions and  operands,  i.e.,  the  same  abstract  machine.  This  guarantees  that 
the  programmer  uses  no  knowledge  about  the  implementation  of  other  parts  of 
the  system  (for  example,  other  modules  or  submodules);  thus,  he  can  make  no 
assumptions  about  the  behavior  of  other  programs  that  can  be  changed 
easily,  when  changes  then  result  in  "incorrect  error-handl  i ng. " This  does 
not  mean  that  the  UE-handling  uses  exactly  the  same  operations  as  the  nor- 
mal case  program;  it  can  restrict  itself  to  a combination  and  thereby 
reduce  the  probability  of  failure. 

This  requirement  also  says  that  both  programs,  UE-handling  and  normal 
case  program,  run  in  the  same  "environment,"  i.e.,  they  have  access  to  the 
same  data.  It  must  be  guaranteed  that  no  user  of  an  abstract  machine  can 
extend  his  access  rights  by  means  of  UEs  or  obtain  additional  information 
which  would  otherwise  not  be  accessible  to  him  (See  also  Chapter  13.). 

Run-Time  Behavior 

While  the  last  section  deals  with  the  structural  aspects  of  UE- 
handling,  I will  now  consider  some  requirements  for  run-time  behavior. 

With  all  mechanisms  for  error-handling,  the  cost  increases  with  the 
frequency  of  UEs^H.  This  proportional  cost  factor  is,  with  many 
mechanisms  (e.g.,  with  "recovery  block"  mechanism),  small  compared  to  the 
part  which  is  always  necessary;  thus,  also  when  no  UE  occurs.  (I  will 
discuss  the  "recovery  block"  concept  more  thoroughly  later.)  [3][4]. 


In  systems  without  UE-handling,  at  least  equivalent  costs  arise  from 
system  breakdown,  loss  of  data,  etc.  It  is  the  aim  of  the  concept  con- 
sidered here  to  minimize  the  run-time  costs  which  originate  from  the 
mechanism  and  which  occur  independently  of  UEs.  This  means:  the  cost  is 
small  as  long  as  no  UE  occurs,  and,  after  the  UE-handling  (successful  or 
unsuccessful),  normal  processing  can  be  continued  as  soon  as  possible 
(i.e.,  there  result  no  unnecessary  delays  because  of  termination  of  the 
processing  and  because  of  repetition  of  already  performed  actions). 

To  reduce  the  costs  when  the  frequency  of  UEs  is  great,  I see  essen- 
tially two  approaches: 

1.  One  provides  for  the  earliest  possible  detection  of  UEs.  In  this 
case,  a quick  reaction  to  such  a UE  is  possible,  and  the  effect  of 
the  UE  on  the  rest  of  the  system  can  be  restricted. 

2.  By  means  of  timely  preparation  for  possible  UEs  and  by  means  of 
sufficient  redundancy  of  data  and  programs  in  a system,  an  occur- 
ring UE  can  be  more  easily  remedied.  If,  for  example,  the  data  on 
the  drum  or  disk  is  frequently  lost,  then  the  cost  of  a UE  to  the 
user  of  the  system  can  be  kept  low  by  means  of  periodic  copies. 
This,  however,  is  based  on  the  assumption  that  the  costs  of  pro- 
ducing copies  are  less  than  the  costs  which  are  associated  with 
the  loss  of  the  data  (which  is  usually  the  case)  (See  also  Chapter 
11.). 

There  are  two  important  points  regarding  efficiency  with  respect  to 
the  UE-handling  itself  (On  both  points  I will  expound  more  completely 
later.  They  should  only  be  mentioned  here  briefly.).  Firstly,  sufficient 
information  about  the  occurring  UE  must  be  provided.  Without  such  infor- 
mation, reaction  to  the  UE  is  not  possible  or  only  possible  with  great 
difficulty.  This  information  includes  data  about  the  type  of  UE,  the  state 
of  the  abstract  machine  and  the  possibilities  for  further  processing.  This 
information  must  exist  in  a form  which  is  comprehensible  to  the  programmer 
of  the  UE-handling. 

Secondly,  the  data  related  to  an  occurring  UE  should  be  ascertained 
where  it  is  simplest  (cheapest)  to  do  so.  Then,  the  analysis  and  handling 
of  hardware-detected  UEs  frequently  can  be  made  easier  by  means  of  the 
availability  of  additional  bits.  The  hardware  of  the  Siemens  4004/151 


shows  that  this  is  not  always  the  case.  There,  various  UEs  are  combined 
under  one  code,  without  making  available  further  information  for  distin- 
guishing the  individual  UEs  (although  this  information  is  present  in  the 
hardware).  (See  Chapters  6 and  14.).  Distinguishing  the  individual  UEs  at 
higher  levels  (above  the  hardware)  is  only  possible  at  very  great  expense. 
This  can  be  avoided  if  the  lower  levels  (in  this  example,  the  hardware) 
provide  the  data  that  are  already  present  or  easily  obtained;  i.e.,  if  the 
lower  levels  would  make  this  data  accessible. 
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NOTE  (Chapter  3) 


1.  This  affects  the  processor  time  as  well  as  all  other  operating  re- 
sources required  at  run-time  (e.g.,  additional  memory  for  data  and 
additional  devices).  The  memory  requirement  for  the  code  of  UE- 
routines  is  generally  independent  of  the  frequency  of  the  UEs. 


CHAPTER  7 


THE  REPORTING  OF  UE'S  TO  HIGHER  LEVELS 

In  Chapter  3,  I required  that:  1)  each  programmer  provide  additional 
code  for  the  reaction  to  UEs,  and  2)  that  this  additional  code  use  the  same 
abstract  operations  and  operands  as  the  normal  case.  In  the  UE-handling  of 
one  level,  no  knowledge  about  higher  levels  or  the  implementation  of  pro- 
grams of  other  modules  or  submodules  may  be  used. 

On  the  other  hand,  however,  a greater  knowledge  is  generally  required 
for  sensible  UE-handling  than  is  available  at  one  level.  For  example,  the 
hardware  identifies  an  improper  entry  in  the  address-translation  tables; 
the  programs  that  manage  these  tables  know  to  which  segment  this  entry  be- 
longs; the  user  of  the  virtual  memory  knows  which  data  are  in  the  segment 
and  what  the  response  to  the  loss  is  (Further  examples  of  such  combinations 
of  knowledge  are  contained  in  [5].).  Without  these  bits  of  information, 
only  general,  often  drastic  measures  are  possible.  Thus,  many  systems  (in- 
cluding the  Siemens  4004/151)  terminate  execution  of  the  processes  involved 
at  the  occurrence  of  such  a UE. 

The  necessary  combinations  of  knowledge  can  be  achieved  by  two  differ- 
ent methods:  1)  by  means  of  a central  routine,  and  2)  by  means  of  the  re- 
porting of  an  occurring  UE  between  the  levels  and  modules  of  the  system. 

The  use  of  a central  routine  that  combines  all  the  necessary  informa- 
tion and  to  which  all  UEs  are  reported  has  some  serious  drawbacks.  Firstly, 
the  modular  structure  of  the  system  according  to  the  information-hiding 
principle  [2]  would  be  destroyed.  This  routine  must  combine  the  informa- 
tion of  several  of  the  system's  modules.  Each  change  in  one  of  the  modules 
would  necessitate  a change  in  this  routine.  Secondly,  this  routine  must  be 
able  to  access  all  the  data  of  several  modules.  It  would  be  impossible  to 
protect  other  programs  and  data  from  this  routine.  This  is  much  more 
serious,  since  this  central  routine  will  become  very  complex  and,  thereby, 
prone  to  errors.  These  disadvantages  should  surely  be  avoided  by  requiring 
that  each  programmer  prepare  code  for  the  event  that  his  program  or  the 
abstract  machine  used  fails.  If  a UE  is  reported  between  levels,  then 
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each  programmer  can  call  upon  his  knowledge  in  order  to  determine  the 
proper  measures  necessary  for  the  handling  of  this  UE ; if  the  reported  UE 
cannot  be  handled  at  one  level,  it  can  be  reported  up  to  the  next  highest 
level . 

Two  methods  are  available  for  the  reporting  of  a UE  to  higher  levels. 
The  first  method  involves  the  use  of  termination  code.  Each  called  program 
has  at  least  one  returned  parameter.  This  indicates,  after  termination  of 
the  program,  whether  an  error  (and,  if  necessary,  which  one)  has  occurred 
during  execution  of  the  program.  The  calling  program  can  check  this  para- 
meter and  initiate  appropriate  actions. 

This  mechanism,  however,  does  not  meet  the  requirements  that  were  des- 
cribed in  Chapter  3 in  a concept  of  UE-handling.  This  mechanism  requires 
that  the  user  of  a program  check  the  returned  parameter  after  each  call  (in 
case  he  does  not  want  to  take  into  account  the  possibility  that  UEs  remain 
undetected).  Such  a check  involves  an  additional  expense  that  is  not 
acceptable  when  the  frequency  of  errors  is  low,  and  in  addition,  it  renders 
more  difficult  the  desired  separation  between  normal  program  and  UE- 
handling.  Too,  this  check  can  be  easily  forgotten,  which  leads  to  UEs  re- 
maining undetected.  This  mechanism  is  based  on  the  premise  that  each  used 
program  is  also  called.  In  Chapter  2 [6],  it  was  shown  that  "uses"  and 
"calls"  do  not  always  have  the  same  meaning. 

Another  mechanism  that  also  meets  the  stated  requirements  depends  on 
the  use  of  traps,  analogous  to  the  reporting  of  UEs  by  the  hardware1. 

"Application  conditions"  are  defined  for  each  program  at  a given  level 
or  for  each  operation  of  an  abstract  machine;  these  conditions  must  be  met, 
so  that  this  program  can  have  the  specified  effect  (compare  to  Chapter  12). 
Each  abstract  machine  has  the  responsibility  for  recognizing  all  violations 
of  the  applicability  conditions  for  one  of  its  operations.  In  the  event  of 
such  a violation,  control  is  transferred  to  a user-defined  UE-routine  with 
the  aid  of  traps.  This  technique  makes  possible  the  desired  (required) 
separation  of  normal  program  and  UE-handling.  The  user  of  an  abstract  ma- 
chine does  not  need  to  make  any  checks  for  such  UEs  in  his  program.  This 
simplifies  the  program  and  reduces  the  probability  that  errors  go  undetec- 
ted. An  additional  consequence  of  this  is  that,  in  general,  user  errors 
can  be  detected  in  the  action  in  which  they  were  produced^. 
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Use  of  the  Trap-Mechanism 

Each  level  which  is  informed  of  a user  error  checks  the  specifications 
to  determine  if  it,  itself,  caused  this  error  or  if  a higher  level  is 
responsible  for  it.  A sensible  handling  of  the  UE  is  not  possible  at 
intermediate  levels;  only  the  level  in  which  it  was  caused  has  enough 
information  to  take  the  appropriate  actions  to  handle  this  UE.  For 
example,  only  the  program  that  has  attempted  to  read  a segment  that  does 
not  exist  knows  what  should  be  done  with  the  data  that  was  to  be  read  and 
what  should  now  be  done  in  the  event  of  error. 

Also,  if  no  sensible  U£-handling  (remedy  or  bypassing  of  the  UE)  is 
possible  at  the  level  in  which  the  UE  was  caused,  then  the  UE  can  be 
reported  as  a "defect"  (in  case  a further  level  exists).  Termination  of 
the  program  is  always  initiated  only  at  the  highest  level,  i.e.,  at  the 
level  of  the  ultimate  user. 

Defects  depend  on  the  failure  of  hardware  or  software  components,  or 
on  an  error  by  the  operating  personnel.  If  an  abstract  machine  is  informed 
of  a defect,  then  it  can  attempt  to  remedy  it.  If  this  is  unsuccessful  or 
if  the  resources  and  information  available  at  this  level  are  insufficient, 
then  the  defect  is  reported  to  the  next  highest  level. 

Concerning  this  passing  of  information  about  UEs,  two  points  are  note- 
worthy: 1)  the  report  to  the  next  highest  level  must  be  adapted  to  the 
abstraction  of  this  level,  i.e.,  the  report  may  not  refer  to  any  infor- 
mation that  is  not  known  to  the  programs  of  this  level  (Thus,  the  reports 
from  programs  of  a virtual  memory  mechanism  to  its  user  may  not  refer  to 
real  addresses.),  and  2)  at  each  level,  an  attempt  must  be  made  to  lead  the 
appropriate  abstract  machine  into  a "possible"  state3.  No  user  of  an  « 

abstract  machine  should  receive  control,  if  the  machine  that  is  used  finds 
itself  in  a state  in  which  the  relationships  between  the  operations  and 
operands  of  the  abstract  machine  are  not  valid4.  Otherwise,  new  UEs 
probably  can  and  will  arise,  UEs  whose  cause  then  becomes  considerably  more 
difficult  to  determine.  Krakowiak  and  Kaiser  [7]  describe  such  an  error. 

It  appears  in  conjunction  with  the  synchronization  of  parallel  processes. 

The  error  mentioned  there  can  be  described  as  follows,  in  a simplified 
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manner  and  with  respect  to  the  terminology  used  here:  Part  of  an  abstract 
machine  is  implemented  with  the  aid  of  a critical  section,  i.e.,  without 
parallelism.  A UE  occurs  in  the  midst  of  this  program  segment.  If  the  UE 
were  now  reported  back  to  the  user,  who  knows  nothing  about  this  synchro- 
nization, without  first  stopping  the  critical  segment  (i.e.,  parallelism  is 
again  not  allowed),  then  certainly  new  errors  would  arise.  Thus,  it  is 
absolutely  necessary  that  the  abstract  machines  be  put,  before  the  transfer 
of  control,  into  a state  in  which  all  relations  are  valid,  relations  which 
are  specifiable  for  the  machine  and  provable  with  the  aid  of  the  specifica- 
tions. If  a UE  is  so  catastrophic  that  this  transfer  is  not  possible,  or 
possible  only  for  a portion  of  the  abstract  machine,  then  this  must  be 
reported  to  the  user.  Each  further  use  of  this  abstract  machine  must  then 
be  prevented,  or  no  responsibility  can  be  assumed  for  the  consequences  of 
such  a use. 

Avoiding  Redundant  Error-Checking 

An  abstract  machine  can  also  delegate  the  responsibility  for  the 
recognition  of  user  errors  to  lower  levels.  If,  for  example,  a parameter 
is  passed  through  several  levels  to  a program  of  a lower  level,  then  one 
can  avoid  redundant  error-checking  by  checking  this  parameter  only  at  the 
lowest  level.  If  an  error  is  determined  there,  then  the  trap  mechanism  is 
used  in  order  to  report  this  UE  back  to  the  level  at  which  it  was  caused. 
In  this  maimer,  the  cost  can  be  reduced,  as  lonj  as  no  UE  occurs.  If, 
however,  the  probability  of  UEs  is  very  large,  then  such  an  economization 
is  no  longer  possible.  The  cost  for  the  reporting  back  of  a UE  has,  then, 
a strong  impact.  It  would  be  more  favorable,  in  this  case,  to  check  the 
validity  of  the  applicability  conditions  at  each  level,  in  order  to  permit 
fast  detection  of  UEs. 

Generally,  one  will  check  all  applicability  conditions  in  the  early 
phases  of  the  use  of  a system;  if  UEs  occur  only  seldom,  then  some  checks 
can  be  removed. 


NOTES  (Chapter  7) 
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On  most  available  hardware  machines,  the  normal  execution  sequence  is 
interrupted  and  control  is  transferred  to  a predetermined  location 
(chosen  by  the  user  or  by  the  hardware)  at  the  occurrence  of  a UE. 

If  a user  error  is  not  immediately  detected,  and  if,  therefore,  this 
leads  to  an  incorrect  state  in  the  abstract  machine,  then  it  will 
later  be  detected  as  a defect  of  the  abstract  machine. 

One  can  specify  for  each  module  or  submodule  a set  of  relations  which 
must  always  be  met.  One  can  prove  these  relations,  in  that  the 
specifications  are  considered  as  a group  of  axioms.  Since  each 
abstract  machine  is  composed  of  a combination  of  submodules,  I will 
designate  a state  of  the  machine  in  which  all  relations  hold  as  a 
"possible  state";  a state  in  which  these  relations  are  not  valid  for 
the  entire  machine  or  for  a part  of  the  machine  is  accordingly 
designated  as  an  "impossible"  state  (For  more  details,  especially  of 
the  proof,  refer  to  [8].). 

It  is  noteworthy  that  these  invariant  predicates  are  not  identical  to 
the  specified  behavior,  but,  rather,  only  secure  the  contents  of  the 
state  description.  They  are  therefore  not  fulfil lable  if  the  desired 
behavior  can  no  longer  be  achieved.  So,  for  example,  a length  and 
access  rights  must  be  defined  for  each  existing  segment.  If  this 
segment  description  is  destroyed,  then  the  validity  of  this  predicate 
can  be  restored  by  designating  this  segment  as  "non-existent." 
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CHAPTER  12 

HARDWARE/SOFTWARE  INTERFACE 

In  this  chapter,  principles  for  the  handling  of  UEs  (undesired  events) 
at  the  interface  between  the  hardware  and  software  are  discussed.  They  are 
applied  to  the  interfaces  of  existing  machines  (the  Siemens  4004/151  and 
also,  in  less  detail,  the  IBM/360,  IBM/370  and  POP  11).  Based  on  this  dis- 
cussion, we  make  recommendations  for  future  hardware  and  software  inter- 
faces to  offer  better  support  of  UE-handling. 

In  support  of  these  recommendations,  several  examples  taken  from  the 
application  of  the  concepts  in  the  implementation  of  a minimal  subset  of 
the  BSF  (an  operating  system  being  developed  in  Darmstadt,  West  Germany)  on 
a Siemens  4004/151  are  cited.  Details  of  our  experience  with  this  im- 
plementation are  given  in  Chapter  13. 

Undesired  Events 

In  order  to  allow  UE-handling  by  the  user  of  an  abstract  machine,  he 
must  know  the  UEs  that  are  possible.  With  regard  to  the  hardware,  this 
means  that  all  applicability  conditions  for  the  operation  of  a real  machine 
and  all  the  externally  distinguishable  hardware  errors  must  be  explicitly 
described.  What,  then,  are  the  possible  UEs  in  a hardware  machine?  In 
general,  all  events  that  require  a special  action  (an  undesired  action  that 
is  not  necessary  in  a normal  program  run)  are  considered  UEs. 

Errors  in  individual  switching  circuits,  wires,  etc.  (e.g.,  parity 
errors)  and  power  failure  comprise  one  group  of  UEs.  These  are  defects  in 
a real  machine. 

Among  the  user  errors  are:  improper  use  of  arithmetic  operations,  the 
use  of  undefined  or  forbidden  operations,  the  specification  of  an  undefined 
or  improperly  aligned  address,  etc.  If  the  real  machine  has  a memory- 
protection  mechanism  or  address-transl ati on  hardware,  then  additional 
applicability  conditions  apply  to  the  particular  operations;  the  violation 
of  those  restrictions  also  represents  a UE.  The  hardware  is  responsible 
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for  the  recognition  of  such  a violation  of  the  applicability  conditions  and 
for  the  reporting  of  a detected  UE  to  the  user. 

To  allow  and  to  support  user  reaction  to  UEs,  the  hardware/sof tware 
interface  should  contain  a clear  (precise)  specification  of  the  effect  of 
the  available  operations.  All  events  which  prevent  the  execution  of  a 
specified  action  should  be  treated  as  UEs  and  reported  by  means  of  traps. 

A typical  example  of  UEs  that  are  not  reported  on  most  machines  by  the 
use  of  traps  are  I/O  errors.  Another  example  is  fixed-point  overflow  on 
the  POP  11.  In  both  cases,  the  appearance  of  a UE  is  noted  by  special  val- 
ues in  the  channel  registers  or  in  a program  status  register.  As  a result, 
the  user  himself  must  check  for  the  occurrence  of  a UE  in  his  own  program. 
Firstly,  this  increases  the  complexity  of  the  program;  secondly,  the  prob- 
ability that  UEs  go  undetected  is  increased.  Further,  the  checking  results 
in  additional  costs  that  are  also  incurred  in  normal  situations. 

Classes  of  UEs 

As  mentioned  in  the  previous  chapter,  it  is  sensible  (for  efficiency 
reasons)  if  the  UEs  of  an  abstract  machine  are  grouped  into  classes  for  the 
purpose  of  reporting  their  occurrence  to  the  user.  The  number  of  classes 
and  the  inclusion  of  a particular  UE  in  one  of  the  classes  is  dependent  up- 
on the  interface  between  abstract  machine  and  user.  No  information  about 
higher  levels  and  the  interface  between  these  levels  may  be  used. 

Consequently,  the  classification  of  hardware-detected  UEs  should  de- 
pend only  on  the  hardware/software  interface,  and  not  on  an  interface  as- 
sumed to  exist  between  operating  system  and  user. 

Below  is  a classification  of  UEs  in  the  central  processor  (I/O  errors 
are  strongly  dependent  upon  the  available  devices;  therefore,  a generally 
valid  classification  would  be  difficult  to  construct.).  If  one  considers 
contemporary  interfaces,  then  one  can  distinguish  the  following  nine 
classes  of  hardware-detected  UEs.  The  first  seven  classes  summarize  user 
errors  for  a hardware  function.  Classes  eight  and  nine  contain  hardware 
defects. 
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1.  Improper  addressing:  The  specified  operation  cannot  be  executed 
with  the  operand  address  (real  or  virtual)  that  is  specified.  For 
example:  improper  alignment,  an  address  that  is  outside  of  core 
memory,  or  improper  register  pair. 

2.  Errors  in  address  translation:  The  virtual  operand  or  instruction 
address  cannot  be  translated.  This  class  is  peculiar  to  machines 
that  have  address-translation  hardware.  Examples  of  UEs  in  this 
class:  inconsistent  entries  in  the  translation  table;  the  ad- 
dressed  table  entry  is  not  initialized;  or  the  location  specified 
is  not  in  main  memory. 

3.  Data  Protection  Errors:  The  attempted  access  of  a data  element  is 
not  allowed  by  the  defined  access  rights.  This  class  of  UEs  de- 
pends upon  the  access  rights  that  are  distinguished  in  the  hard- 
ware. 

4.  Unimplemented  Operations:  The  specified  operation  does  not  exist. 

5.  Data  Errors  in  Decimal  Operations:  An  operand  in  a decimal  opera- 
tion is  incorrect  (e.g.,  improper  overlap). 

6.  UEs  in  Fixed-Point  Operations:  For  example:  overflow,  underflow, 
or  division  errors. 

7.  Floating-Point  Errors:  Exponent  overflow  and  underflow;  a mantis- 
sa which  is  approximately  equal  to  zero. 

8.  Hardware  Errors:  Defects  of  the  hardware,  such  as  parity  errors. 

9.  Power  Failure. 

This  classification  is  not  complete;  i.e.,  not  all  of  the  UEs  found  in 
existing  machines  can  be  put  into  one  of  the  above  categories.  At  times, 
additional  classes  are  required,  depending  upon  the  complexity  of  the  ma- 
chine under  consideration  (e.g.,  UEs  for  one  of  the  hardware-handled  stacks 
and  emulator  errors).  The  above  list  gives  only  the  most  important  classes 
which  can  be  defined  for  most  of  the  currently  available  computers. 

Classification  Schemes  in  Existing  Machines 

If  one  considers  existing  hardware/software  interfaces,  then  one  gen- 
erally finds  another  classification:  The  Siemens  4004/151,  which  has 
address-translation  hardware,  distinguishes  six  different  traps  for  UEs 
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arising  from  arithmetic  operations,  while  all  UEs  in  addressing,  address 
translation  and  data  protection  and  some  other  UEs  (e.g.,  emul ator-trap ) 
are  divided  into  two  UE-groups.  These  two  groups  are  further  subdivided 
into:  1)  user  addressing-error,  2)  system  addressing-error,  3)  user 

address  translation  errors,  4)  system  address-transl  ati  on  errors,  and  5) 
memory-protection  errors.  Here,  system  refers  to  the  operating  system. 

Such  a "classification"  supports  reaction  to  fixed-  and  floating-point 
arithmetic,  but  renders  more  difficult  the  analysis  and  handling  of  the 
remaining  UEs^.  This  occurs  also  in  the  implementation  of  the  minimal 
subset  of  BSF  [9].  The  difficulties  arising  from  this  implementation  are 
discussed  in  Chapter  14. 

Division  Between  Normal  Program  and  UE-Handlinq 

Most  hardware  machines  have  an  interrupt  mechanism  by  means  of  which 
the  currently  executing  command  sequence  in  the  central  processor  can  be 
interrupted,  followed  by  a branch  to  a predetermined  address.  An  inter- 
ruption of  this  type  can  be  caused  by  four  groups  of  events:  11  synchro- 
nous UEs,  i.e.,  UEs  which  occur  at  the  time  of  execution  of  an  instruction 
(e.g.,  overflow)^;  2)  normal  synchronous  (i.e.,  caused  by  the  currently 
executing  program)  events  (SVC);  3)  normal  asynchronous  events  (e.g.,  ter- 
mination of  an  I/O  operation  at  the  request  of  the  console  operator);  and 
4)  asynchronous  UEs  (e.g.,  I/O  errors).  Groups  (2)  and  (3)  include  normal 
(desired)  events;  reaction  to  these  events  is  part  of  normal  processing  and 
should  therefore  be  separate  from  the  reaction  to  UEs4. 

On  most  currently  used  machines,  the  same  mechanism  is  used  for  all  of 
these  events;  i.e.,  a deviation  from  the  processing  sequence  is  forced  and 
control  is  switched  to  a routine  specified  previously.  The  address  of  this 
routine  may  be  taken  from  a register  or  may  be  a fixed  place  in  memory.  The 
number  of  distinct  routines  varies  from  machine  to  machine. 

On  the  Siemens  4004/151,  only  one  routine  can  be  defined.  All  in- 
terruptions transfer  control  to  this  routine.  On  the  IBM/360,  separate 
routines  can  be  defined  for  classes  (1)  and  (2),;  however,  for  groups  (3) 
and  (4),  only  one  routine  definition  is  assigned.  This  means  that  a 


separation  between  a normal  program  (here,  the  handling  of  events  in  groups 
(2)  and  (3))  and  UE-handling  cannot  always  be  maintained.  When  this  sepa- 
ration is  possible,  it  can  only  be  realized  with  the  help  of  a virtual  ma- 
chine. Thus,  additional  software  is  required  in  order  to  undo  the  combina- 
tion of  the  two  classes  by  the  hardware.  This  results  in  the  following  re- 
quirement for  hardware/software  interfaces:  The  hardware/  software  inter- 
face should  at  least  allow  for  a separate  routine  for  each  of  the  four 
classes  of  events.  This  can  be  achieved  by  a minimum  of  four  separate 
registers  or  memory  locations,  in  which  the  addresses  for  these  routines 
are  kept. 

Number  of  UE-Routines 

For  the  separation  of  normal  program  and  UE-handling,  it  is  sufficient 
that  a distinct  routine  be  defined  for  each  of  these  groups.  The  question 
is  whether,  for  example,  restricting  the  handling  of  all  synchronous  UEs  to 
a single  trap-routine  is  possible  or  sensible.  The  answer  to  this  question 
depends  heavily  on  costs  and  the  complexity  of  the  hardware.  For  a machine 
that  only  allows  fixed-point  arithmetic,  a trap-routine  is  sufficient.  But 
what  happens  when  the  machine  has  fixed-  and  floating-point  arithmetic, 
address-translation  hardware,  etc.? 

In  general,  the  UE-routines  are  simpler  when  they  handle  fewer  UEs. 
In  the  extreme  case,  one  would  prepare  a separate  routine  for  each  UE. 
This,  however,  is  unsuitable  for  two  reasons:  (1)  In  many  cases,  the  num- 
ber of  UEs  is  too  large.  The  expense  involved  in  the  definition  of  the 
UE-routines  would  be  very  high  and  would  be  especially  inappropriate  for 
small,  inexpensive  machines.  (2)  Such  a solution  would  frequently  lead  to 
unnecessary  redundant  coding,  since,  in  many  cases,  the  same  or  similar 
actions  are  necessary  (e.g.,  the  saving  of  certain  registers,  similar 
steps).  A feasible  compromise  exists--grouping  related  UE"  together  and 
handling  them  by  means  of  a general  UE-routine.  A classification  of  syn- 
chronous UEs  for  currently  used  machines  has  already  been  given.  A run- 
time parameter  can  be  used  to  designate  individual  UEs  within  a group. 
With  this  in  mind,  the  hardware/software  interface  should  set  up,  for  each 
group,  a register  or  memory  location  which  contains  the  address  of  the  UE- 
routine  for  this  particular  class. 
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Delayed  Traps 

If  processors  are  used  jointly  by  several  processes  (successively), 
and  if  the  processor  can,  due  to  the  occurrence  of  an  asynchronous  event 
(end  of  time-slice,  termination  of  an  I/O  operation,  request  for  a device), 
be  withdrawn  from  the  currently  running  program  and  assigned  to  another,  a 
further  problem  arises.  Synchronous  UEs  should  be  handled  by  the  process 
in  which  they  occur.  A process  should  certainly  not  be  delayed  by  UEs  that 
do  not  concern  it  and  which  it  is  not  prepared  to  handle. 

Let  us  consider  an  example:  Assume  that  an  address-translation  error 
caused  a trap.  Successful  handling  of  the  error  depends  upon  the  avail- 
ability of  the  translation  table  which  led  to  the  occurrence  of  the  UE.  If 
these  tables  were  exchanged  in  the  meantime  (e.g.,  by  a process  change), 
analysis  and  handling  are  made  very  difficult. 

If  one  considers  current  hardware/software  interfaces,  both  synchro- 
nous and  asynchronous  events  are  indicated  by  the  setting  of  a bit  in  an 
interrupt  register.  Which  of  the  events  leads  to  an  interruption  ( i . e - , 
causes  the  transfer  of  control  to  a certain  routine)  depends  upon  the 
actual  interrupt  mask  and  on  the  priorities  of  the  individual  interrupts. 
If  asynchronous  events  have  higher  priorities  than  synchronous  ones,  as, 
for  example,  on  the  Siemens  4004/151,  it  is  possible  that  the  reporting  of 
a synchronous  UE  will  be  delayed  until  the  handling  of  a simultaneously 
occurring  asynchronous  UE  is  completed.  If  the  latter  contains  a process 
change,  which  is  frequently  the  case,  the  UE  is  reported  in  the  wrong  pro- 
cess^5].  To  avoid  this,  one  must  consider  delayed  synchronous  events 
as  part  of  the  process  data  and  exchange  them  in  a process  change.  In  most 
currently  used  machines,  this  requires  a reloading  of  a part  of  the  inter- 
rupt register;  the  actual  contents  of  part  of  this  register  must  be  stored 
away  with  the  other  data  (e.g.,  registers)  of  the  unloaded  process  and  the 
new  contents  gathered  from  the  process  data  of  the  process  being  loaded. 
Also,  the  additional  information  about  an  occurring  UE  which  is  stored  in 
registers  must  be  saved  and  reloaded. 

What  effect  does  such  an  interface  have  on  the  system?  (1)  The  pro- 
gram for  the  process  change  becomes  more  complicated  and  more  costly. 
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(The  storing  away  and  the  setting  of  particular  bits  in  a register  is  in 
general  very  expensive  and  requires  great  care  on  the  part  of  the  pro- 
grammer.)6. (2)  For  the  process  change,  a detailed  knowledge  of  pos- 
sible traps  and  the  registers  used  for  provision  of  run-time  information  is 
necessary. 

To  avoid  these  difficulties,  the  hardware/software  interface  should 
either  assign  traps  higher  priority  than  interrupts  or  support  the  storage 
of  the  (information  absent)  delayed  traps  and  UE-i nformati on , while  pro- 
viding operations  for  the  storing  and  retrieval  of  this  data.  It  would  be 
best,  therefore,  if  the  reaction  to  synchronous  UEs  (precisely,  the  part  of 
the  reaction  which  requires  a ban  on  interrupts)  were  as  short  as  possible, 
so  that  asynchronous  UEs  do  not  have  to  be  delayed  too  long. 

"Environment"  for  UE-Handlinq 

For  abstract  machines,  I required  that  each  programmer  (or  each  group 
of  programmers)  who  writes  a program  for  this  machine  prepare  additional 
code  for  the  case  where  the  original  program  has  errors  or  where  the  ab- 
stract machine  is,  for  some  reason,  not  able  to  execute  this  program.  This 
UE-handling  should  use  the  same  abstract  operations  and  data  structures  as 
the  original  program;  i.e.,  it  should  work  in  the  same  "environment."  The 
user  is  not  allowed,  by  means  of  UE-handling,  to  obtain  additional  infor- 
mation that  is  normally  unknown  or  protected. 

For  the  hardware  machine,  this  means  that  UE-handling  by  the  user 
should  have  the  same  privileges  and  the  same  access  rights  as  the  user's 
normal  program.  This  environment  must  be  produced  by  the  occurrence  of  a 
UE.  The  associated  information  must  be  already  stored  in  special  registers 
or  memory  locations.  The  hardware/software  interface  should  support  the 
explicit  definition  of  the  environment  for  each  interrupt  routine  and  the 
automatic  production  of  this  environment  at  the  occurrence  of  the  corre- 
sponding interrupt.  In  addition  to  the  register  or  memory  location  for  the 
address,  for  each  group  of  interrupts,  additional  registers  or  memory  loca- 
tions, whose  contents  indicate  the  environment  for  the  appropriate  inter- 
rupt routine,  should  be  available. 
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The  example  of  the  minimal  subset  of  BSF  [9][10][11]  shows  that  it  is 
not  sufficient  for  the  values  of  the  control  register  to  be  retained  at  the 
occurrence  (or  detection)  of  a UE.  This  minimal  subset  makes  available  a 
virtual  memory  and  the  operations  for  this  virtual  memory.  Since  the  Sie- 
mens 4004/151  has  address-translation  hardware,  no  software  is  necessary 
for  address  translation.  The  control  registers  thus  always  indicate  the 
environment  of  the  actual  user  of  the  minimal  subset,  not  the  environment 
in  which  the  address  translation  exists.  The  reaction  of  the  minimal  sub- 
set to  hardware-detected  UEs  must,  however,  be  implemented  by  software  rou- 
tines. These  routines  must  work  with  real  addresses  and  have  access  to  the 
translation  table.  Thus,  the  production  of  the  necessary  environment  must 
be  done  explicitly,  i.e.,  by  means  of  software. 

As  already  mentioned,  the  hardware  of  the  Siemens  4004/151  transfers 
control  to  the  same  routine  for  synchronous  and  asynchronous  events;  this 
routine  works  in  a particular  program  state.  The  result  of  this  is  that 
these  general  routines:  1)  function  with  real  addresses  (interrupt  hand- 
ling can  also  function  with  virtual  addresses,  and  2)  must  have  access  to 
all  data  which  are  necessary  either  for  UE-handling  or  for  interrupt- 
handling. Because  of  these  maximal  access  rights,  this  routine  represents 
one  of  the  critical  parts  of  the  system^]. 

If  the  proposed  additional  registers  or  memory  locations  were  pro- 
vided, the  environment  necessary  for  each  routine  could  be  produced  di- 
rectly from  the  hardware.  A special  software  routine  with  maximal  access 
rights  would  not  then  be  necessary. 

Relationship  Between  UE  and  UE-Handlinq 

Each  abstract  machine  can  be  used  by  several  different  programs  at 
higher  levels.  Each  "user"  can  have  his  own  UE-handling.  As  discussed  in 
Chapter  6,  the  actual  user's  provision  and  the  report  of  the  UE  to  this 
user  can  be  supported  by  a dynamic  relationship  between  UE  and  UE-routine. 
In  hardware  machines,  the  relationship  between  UE  and  UE-routine  can  be 
changed,  when  one  changes  the  trap-address  (i.e.,  the  address  of  the  trap 
routine)  at  run-time. 
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Changing  of  the  trap  address  should  be  supported  by  the  hardware  so 
that  no  knowledge  of  other  parts  of  the  system  (e.g.,  interrupt-handling) 
is  necessary  for  this  change.  In  most  currently  available  hardware/sof t- 
wa re  interfaces,  the  above  feature  is  not  included.  In  the  Siemens  4004/ 
151,  the  changing  of  the  trap  address  means  simultaneously  changing  the 
interrupt  address  (i.e.,  it  defines  a new  interrupt  routine).  On  the  IBM 
360  and  the  POP  11,  the  trap  address  can  indeed  be  changed  without  influ- 
encing the  handling  of  other  interrupts;  however,  the  trap  addresses  and 
the  addresses  of  the  interrupt-handling  routines  are  in  the  same  memory 
area.  The  access  rights  which  are  necessary  for  the  changing  of  the  trap 
address  also  allow  access  to  the  addresses  of  the  other  interrupt-handling 
routines.  If  part  of  the  data  should  be  changed  by  a program,  it  is  not 
possible  to  protect  the  rest  of  the  data  against  unauthorized  changes  by 
the  program. 

In  order  to  support  the  information-hiding  principle  [2]  and  for  data 
protection  among  programs,  the  hardware/software  interface  should  provide 
separate  data  areas  with  individual  or  separate  access  rights  that  can  be 
specified  singly. 

Resumption  of  Normal  Execution 

In  Chapter  5,  I introduced  three  functions  which  should  be  available 
to  the  user  of  an  abstract  machine  for  the  resumption  of  normal  execution: 
CONTINUE,  RETRY  and  CLEAR. 

If  one  considers  currently  used  hardware/sof tware  interfaces,  none 
offers  a corresponding  set  of  functions.  The  only  way  to  implement  the 
functions  CONTINUE  and  RETRY  is  to  explicitly  reset  the  old  (saved)  value 
of  the  instruction  counter  (i.e.,  the  value  at  the  time  that  the  UE  occur- 
red) in  the  actual  instruction  counter  (by  loading  the  appropriate  regis- 
ter). All  of  these  actions  must  be  performed  by  software.  In  most  cases, 
there  are,  in  addition  to  the  instruction  counter,  various  other  control 
registers  to  load,  in  order  to  recreate  the  environment  of  the  interrupted 
program.  The  only  machine  known  to  me  which  allows  one  to  take  control  of 
the  instruction  counter  is  the  PDP  11.  The  hardware/software  interface 
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of  this  machine  offers  an  operation  which  exchanges  the  old  stored  value  of 
the  instruction  counter  with  the  actual  value  of  the  instruction  counter 
and  thereby  allows  the  resumption  of  the  interrupted  program.  This 
represents  an  important  feature  of  the  software. 

Hardware/software  interfaces  should  supply  a special  mechanism  (in- 
struction) for  the  resumption  of  an  instruction  sequence  which  has  been 
interrupted  by  a trap. 

An  essential  difference  between  general  abstract  machines  and  hardware 
machines  is  that  resumption  of  an  interrupted  hardware  operation  by  the 
hardware  is  not  usually  possible.  Before  the  notification  of  the  presence 
of  a UE  to  the  user  (i.e.,  before  the  occurrence  of  a trap),  the  currently 
executing  operation  is  terminated;  continuation  is  possible  only  with  a new 
operation.  This  property  is  certainly  efficient,  when  it  handles,  as  is 
the  case  with  most  hardware  commands,  simple  operations  with  short  execu- 
tion time.  The  situation  is  different  when  a machine  performs  complex 
operations.  Let  us  consider,  for  example,  a machine  with  address-transla- 
tion hardware.  In  such  a machine,  command  execution  is  carried  out  in  two 
parts:  1)  translation  of  the  virtual  address  into  a real  one;  2)  execu- 
tion of  the  operation  with  the  operands  which  are  designated  by  the  real 
addresses.  In  most  machines,  a UE  in  one  of  the  two  parts  leads  to  termi- 
nation of  the  currently  executing  instruction  and  to  the  occurrence  of  the 
trap. 

Normally,  no  memory  location  or  register  is  changed  during  address 
translation  (except  some  that  are  internally  used  and  not  externally  no- 
ticeable). After  removing  the  cause  of  the  UE,  it  would  be  possible  in 
principle  to  continue  or  repeat  the  interrupted  operation.  Such  a resump- 
tion or  repetition  of  an  interrupted  operation  is  not  supported  in  current 
hardware/software  interfaces.  The  resetting  of  the  instruction  counter, 
and  eventually  other  registers,  must  be  carried  out  explicitly  by  the  user 
of  the  hardware.  Here  also  the  PDP  11  shows  a step  in  the  right  direction. 

The  designers  of  the  hardware  recognized  the  necessity  of  such  a possi- 
bility and  gave  a comprehensive  description  of  the  actions  required.  The 
programming  of  these  actions  is,  however,  left  to  the  user.  Stronger  sup- 
port by  the  hardware  would  lead  to  greater  efficiency  and  to  a lower  prob- 
ability of  errors. 

] 
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Hardware/software  interfaces  should  supply  mechanisms  that  can  be 
called  upon  to  continue  or  repeat  a partially  completed  operation  after  the 
cause  of  the  UE  is  removed. 

Defects  of  Hardware  Functions 

Some  actions  of  an  abstract  machine  can  also  be  carried  out  by  the 
user  of  the  machine  with  the  aid  of  other  operations  of  the  same  machine. 
For  example,  with  the  help  of  integer  arithmetic  in  FORTRAN,  signal  proces- 
sing can  be  implemented.  This  implementation  is  indeed  very  inefficient, 
but  it  is  theoretically  possible. 

For  real  machines,  similar  examples  can  be  found.  Floating-point 
arithmetic  can  be  implemented  with  the  aid  of  fixed-point  arithmetic.  This 
method  is  less  efficient  than  a hardware  implementation,  but  the  results 
are  the  same.  Address  translation  presents  another  interesting  example. 
On  a Siemens  4004/151,  if  the  address-transl ation  hardware  fails,  the  en- 
tire system  collapses.  Its  function,  the  translation  of  virtual  addresses 
into  a table  of  available  locations  in  memory,  can  also  be  carried  out  by  a 
program.  These  programs  would  need  more  time  for  the  translation,  but 
would  produce  the  same  results.  In  order  to  allow  for  such  a program, 
there  would  have  to  be  the  possibility  of  communication  of  virtual  and  real 
addresses  between  hardware  and  program.  This  can  happen  if  the  internally 
used  address  registers  can  be  made  available  by  special  commands  or  as 
special  memory  addresses.  Not  all  hardware  components  can  be  replaced  by 
software  programs;  other  hardware  components  have  such  a low  probability  of 
failure  that  special  preparations  for  the  possibility  of  failure  are  not 
worthwhile.  Further,  the  software  implementation  will  certainly  not  be 
considered  a long-term  alternative.  It  can,  however,  bridge  over  short- 
term difficulties  and  permit  an  orderly  shutting  down  of  the  system. 

If  one  must  contend  with  an  occasional  failure  of  hardware  components, 
possible  software  implementations  of  these  components  should  not  be  ex- 
cluded. The  hardware/software  interface  should  allow  for  such  an  imple- 
mentation and  provide  for  an  interchange  of  results. 
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Costs  of  the  Proposed  Changes 

In  this  section,  the  proposed  changes  to  the  hardware/software  inter- 
face in  the  example  of  the  Siemens  4004/151  will  be  made  concrete.  At  the 
same  time,  the  costs  for  an  altered  interface  will  be  analyzed.  The  Sie- 
mens 4004/151  is  equipped  with  four  register  sets.  Each  of  these  sets  con- 
sists of  the  control  registers  I SR  (Interrupt  State  Register),  I MR  (Inter- 
rupt Mask  Register)  and  the  PC  (Program  Counter),  and  a set  of  general  pur- 
pose registers.  In  addition,  the  machine  is  equipped  with  a BTAR  (Block 
Table  Address  Register).  The  register  sets  are  closely  related  to  the  four 
program  states  (hardware  errors,  interrupt  analysis,  interrupt  handling  and 
user  program).  The  number  of  multipurpose  registers  and  the  addressing  of 
the  control  registers  are  not  the  same  for  all  states8.  At  the  moment, 
an  interrupt  is  provided  in  each  channel  in  the  Siemens  4004/151,  by  means 
of  which  both  normal  termination  occurs  and  the  occurrence  of  an  error  is 
reported.  This  existing  interface  should  be  changed  as  follows: 

1.  The  events  of  each  of  the  four  groups  (synchronous  normal,  syn- 
chronous undesired,  asynchronous  normal,  asynchronous  undesired) 
are  reported  by  separate  interrupts.  Since  only  two  synchronous 
normal  events  are  possible  (SVC  and  test  operation),  only  two 
interrupts  are  required  for  this  group.  For  each  class  of  syn- 
chronous UEs  (refer  to  the  recommendations  at  the  beginning  of 
this  chapter),  a separate  interrupt  should  be  provided.  In  addi- 
tion to  the  previously  discussed  interrupts  for  normal  asynchronous 
events,  separate  interrupts  for  the  corresponding  asynchronous  UEs 
should  also  be  provided.  With  respect  to  the  current  situation, 
this  would  require  three  additional  interrupts9. 

2.  The  strict  association  of  program  states  with  register  sets  is 
abolished.  Instead,  three  or  four  independent  sets  of  multi- 
purpose registers  are  proposed.  The  actual  register  set  is  in- 
dicated by  a bit  combination  in  one  of  the  control  registers.  The 
control  registers,  including  BTAR,  are  necessary  only  in  a model. 
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For  this  change,  no  additional  registers  are  required.  Only  their 
strict  relationship  to  program  states  and  special  tasks  is  re- 
moved. 

3.  In  the  event  of  an  interrupt,  the  control  register  is  reloaded  by 
the  hardware.  For  each  group  of  interrupts,  a separate  area  of 
memory  is  provided.  This  contains  the  new  values  of  the  control 
registers.  The  old  values  are  stored  in  a memory  location  whose 


address  is  stored  in  a register  or  a specially-allocated  memory 
location.  In  this  case,  an  extension  of  the  mechanism  to  a defi- 
nition of  separate  memory  areas  for  each  interrupt  incurs  only 
additional  memory  costs.  For  the  protection  of  this  data  against 
unauthorized  accesses,  it  would  be  best  if  separate  access  rights 
could  be  assigned  for  each  of  these  memory  areas.  This  can  be 
achieved,  for  example,  by  a virtual  memory  which  allows  the  defi- 
nition of  very  small  segments  (several  words).  The  Siemens  4004/ 
151  has  an  address-translation  mechanism;  however,  the  minimum 
segment  size  is  4096  bytes.  The  situation  is  improved  if  one 
allows  a change  of  the  data  for  the  interrupt  routines  (address 
and  environment)  only  through  special,  protected  programs.  The 
correct  method  and  the  protection  of  this  program  secures  simul- 
taneous protection  of  the  data.  The  problem  before  us  is  that 
traps  which  have  not  yet  appeared  can  be  lost  if  one  assigns  syn- 
chronous events  a higher  priority  than  asynchronous  events.  The 
costs  for  this  change  are  low.  The  routines  for  reacting  to  syn- 
chronous events  must  be  so  laid  out  that  the  reaction  to  asynchro- 
nous events  is  delayed  as  little  as  possible. 

Two  machine  commands  (CONT  and  RETRY)  can  be  provided  for  the  resump- 
tion of  the  interrupted  execution  and  the  restoration  of  the  interrupted 
state;  CONT  picks  up  execution  at  the  place  where  it  was  interrupted  (gen- 
erally, with  the  next  operation  in  the  command  sequence)  and  RETRY  repeats 
the  last  command^!.  The  operands  of  these  operations  are  that  of: 

1)  the  register  set  being  used,  and  2)  the  address  of  the  memory  area  in 
which  the  old  values  of  the  control  and  address  registers  are  stored. 
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If  only  one  register  set  is  available,  the  first  operand  is  not  used.  The 
cost  for  these  two  additional  commands  is  relatively  low  on  most  of  the 
large  machines. 

In  order  for  address  translation  to  be  carried  out  by  software,  in  the 
event  of  a defect  in  the  address-translation  hardware,  the  internally  used 
registers  should  oe  made  available  either  as  special  registers  or  as  spec- 
ial words  in  memory.  Upon  the  occurrence  of  an  address-translation  error, 
the  currently  executing  operation  would  then  be  interrupted,  not  termina- 
ted. A resumption  of  this  operation  should  then  be  possible  with  CONT  or 
RETRY.  This  change  might,  as  previously  mentioned,  be  controversial;  how- 
ever, it  allows  an  orderly  shutting  down  of  all  system  components  and  even- 
tually some  user  programs,  in  the  event  of  address-translation  failure. 

On  other  machines,  different  criteria  would  be  used  in  considering  the 
costs  for  the  changes  cited  above.  In  the  case  of  the  Siemens  4004/151, 
few  additional  costs  arise;  essentially,  only  unfavorable,  existing  design 
decisions  had  to  be  revised.  In  contrast,  additional  registers  and  com- 
mands in  small,  inexpensive  computers  can  represent  a significant  cost 
factor.  Therefore,  one  must  analyze  in  the  design  of  a machine  all  possi- 
bilities for  achieving  the  ideas  set  forth  in  this  section  and  determine 
the  cost;  in  most  cases,  the  final  design  represents  a compromise  between 
the  requirements  presented  and  the  cost  of  implementing  them. 
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NOTES  (Chapter  12) 

1.  Since  I/O  devices  are  very  slow,  the  costs  for  the  checking  (performed 
mainly  by  the  central  processor)  of  I/O  errors  is  small  compared  to 
the  determination  of  errors  in  the  central  processor.  The  costs  of 
further  interruptions  can  sometimes  be  higher  than  the  costs  of  error 
determination  supplied  by  the  program. 

2.  A similar  classification  exists  for  the  IBM/360,  /370. 

3.  Hereafter,  I will  use  the  term  "traps"  for  interruptions  which  are 
caused  by  events  in  group  (1). 

4.  A separation  between  synchronous  and  asynchronous  events  is  desired, 
since,  in  general,  different  users  of  the  hardware  machine  must  react 
to  these  events.  In  addition,  for  the  reaction  to  synchronous  events, 
a built-in  stack  can  be  used;  for  asynchronous  events,  the  use  of  a 
stack  is  questionable,  since,  for  the  sequence  of  reactions  to  asyn- 
chronous events,  factors  other  than  the  order  of  occurrence  can  be 
decisive. 

5.  In  general,  this  problem  does  not  arise  in  the  case  of  I/O  errors, 
since  it  is  not  normally  possible  to  interrupt  the  currently  executing 
channel  program  and  later  continue.  The  unloading  of  a channel  pro- 
cess occurs  only  as  a reaction  to  an  I/O  error  in  the  execution  of 
the  process  or  after  the  successful  completion  of  the  I/O.  The  multi- 
plex channel  is  an  exception;  however,  the  switching  between  processes 
is  carried  out  completely  by  the  channel  itself. 

6.  Note  that  these  actions  must  be  carried  out  in  such  a way  that  no  in- 
terrupts are  lost. 

7.  These  access  rights,  then,  are  also  required  if  the  general  routine 
implements  a virtual  machine  in  which  traps  and  interrupts  are  sepa- 
rated and  have  different  access  rights.  In  this  case,  the  general 
routine  must  have  the  right  to  confer  any  privileges  and  access 
rights.  It  is,  therefore,  necessary  for  the  routine  itself  to  have 
these  rights. 

8.  For  example,  in  state  P3,  the  control  registers  of  the  other  states 
can  be  regarded  as  multi-purpose  registers. 


9.  If  one  considers  fixed-point,  floating-point  and  decimal  errors  as 
subclasses  of  a general  class  of  UEs,  one  can  manage  with  the  previous 
number  of  interrupts.  Then,  only  the  relationship  of  the  interrupts 
to  the  individual  events  need  be  changed. 

10.  On  the  Siemens  4004/151,  RETRY  allows  only  a resetting  of  the  instruc- 
tion counter;  on  the  POP  11/45,  it  is  possible  that  some  address  reg- 
isters must  also  be  reset. 
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CHAPTER  14 


COSTS  OF  DE-HANDLING 

The  UE-routines  of  the  minimal  subset  occupy  about  five  hundred  32-bit 
words  (including  data).  A direct  comparison  with  the  memory  requirement  of 
the  normal  program  of  the  minimal  subset  is  not  possible,  since  the  address 
translation  is  carried  out  by  the  hardware.  The  only  software  routine  of 
the  minimal  subset,  MAPSELECT,  includes  twenty  commands  (the  memory  require- 
ment for  the  data  depends  on  the  number  of  address  spaces). 

The  number  of  instructions  executed  in  the  UE-case  is  between  fifteen 
and  eighty  machine  instructions,  depending  on  which  UE  occurred.  The  costs 
for  the  production  of  the  UE-routines  of  the  minimal  subset  totalled  ap- 
proximately two  man-months.  The  first  simple  versions  were  completed  in 
two  weeks  with  the  aid  of  a mathematical -technical  assistant. 

The  relatively  high  costs  of  the  handling  of  some  UEs  resulted  essen- 
tially from  the  following  causes:  the  classification  of  the  UEs  that  were 
reported  by  the  hardware  and  the  information  about  these  UEs  were  insuffi- 
cient. The  classification  came  from  the  interface  between  a complete  oper- 
ating system  and  its  user.  The  minimal  subset  makes  available,  however, 
only  a simple  virtual  memory  with  a fixed  number  of  address  spaces  and  a 
fixed  number  of  segments.  In  order  to  report  UEs  to  the  user  of  the  mini- 
mal subset,  they  must  be  analyzed  separately  and  re-grouped  into  classes. 
This  was  made  more  difficult,  since  the  hardware  provided  insufficient  or 
no  information  about  the  exact  UE  in  a class. 

Some  UEs,  therefore,  could  generally  not  be  identified;  others  had  to 
be  determined  by  the  exclusion  of  all  other  UEs  in  their  class.  An  example 
is  the  previously  mentioned  RESTOK-bit;  this  indicates  whether  an  inter- 
rupted instruction  is  repeatable.  Some  UEs  appear  twice  within  a class  in 
the  classification  of  the  hardware,  once  with  a set  and  once  with  an  unset 
RESTOK-bit.  This  bit,  which  would  be  necessary  in  order  to  distinguish 
between  the  two  situations,  is  not  accessible,  however,  outside  the  hard- 
ware. 
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This  detailed  and  somewhat  complicated  analysis  of  occurring  UEs 
contributed  substantially  to  the  relatively  high  costs  of  UE-handling  for 
the  minimal  subset.  A lowering  of  the  costs  is  possible  in  two  ways:  1) 
the  hardware /software  interface  is  changed,  or  2)  one  dispenses  with  a 
detailed  analysis  of  UEs.  One  notes  that,  in  the  first  case,  no  change  to 
the  interface  to  the  user  of  the  minimal  subset  is  necessary.  Only  the 
UE-analysis  in  the  UE-routines  of  the  minimal  subset  must  be  changed.  If 
one  dispenses  with  a detailed  analysis,  then  the  possibilities  for  UE- 
handling  at  the  higher  levels  are  thereby  restricted.  A subsequent 
definition  of  an  individual  UE  at  higher  levels  is  generally  impossible  or 
very  expensive*. 

An  additional  cause  for  the  costs  of  the  UE-handling  is  the  possibili- 
ty of  a resumption  of  normal  processing.  Then,  the  values  of  the  control 
register  (and  perhaps  some  other  registers)  must  be  saved  at  the  time  of 
occurrence  of  a UE  and  later  reloaded.  This  requires,  on  the  one  hand, 
memory  space  for  these  registers  (about  10  words)  and,  also,  additional 
time  for  the  execution  of  the  corresponding  instructions  (about  five  in- 
structions). If  one  dispenses  with  the  possibility  of  resumption  of  the 
interrupted  command,  i.e.,  if  one  permits  basically  only  the  use  of  CLEAR, 
then  these  costs  can  be  economized  (If  one  still  wants  to  resume  normal 
execution  with  some  UEs,  then,  in  most  cases,  this  will  require  a repeti- 
tion of  already  successfully  executed  actions. )2. 

This  shows  that  the  costs  of  UE-handling  can  also  be  reduced  by  the 
proposed  concept,  if  one  reduces  the  demands  on  the  UE-handling  to  a mini- 
mum (all  UEs  are  reported  as  one,  no  possibility  of  resumption  of  normal 
processing).  This  possibility  is  not  excluded  by  the  proposed  method  and, 
more  importantly,  also  does  not  increase  the  costs.  The  least  amount  of 
cost  which  is  incurred  by  implementation  of  the  proposed  concept  is  the 
cost  for  the  realization  of  the  traps  for  the  reporting  of  UEs  up  to  the 
highest  level  (where  termination  can  then  result).  These  costs  can  only  be 
avoided  then,  if  from  the  beginning,  each  reporting  and  therefore  each 
handling  is  dispensed  with  and  if  execution  halts  with  the  occurrence  of  a 
UE  at  one  of  the  lower  levels;  but  such  an  assumption  must  then  be  made  in 
the  entire  system  and  would  not  then  be  changed,  or  only  with  great  expense 
and  difficulties. 
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NOTES  (Chapter  14) 


For  example,  the  UE-routines  of  the  minimal  subset  can  call  upon  the 
address-translation  tables  for  the  analysis  of  UEs  and  for  determi- 
nation of  the  affected  segment.  The  programs  of  higher  levels  have  no 
access  to  it.  An  exception  are  the  programs  of  higher  levels  which 
belong  to  the  same  module  as  the  minimal  subset. 

Note  that  the  costs  do  not  depend  on  whether  one  considers  only  CON- 
TINUE or  only  RETRY  as  continuation  possibilities. 
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I.  INTRODUCTION 

The  primary  unit  under  investigation  is  any  integrated-circuit 
(IC)  device  with  simple  functionality.  As  an  example,  consider  an  eight 
bit  full-adder  in  a DIP  ceramic  package.  A certain  reliability  will  be 
associated  with  such  a unit.  The  nature  of  this  reliability  reflects  the 
age  state  of  the  device. 


In  state  "F",  the  user  is  relying  on  a system  that  is  providing 
erroneous  results.  In  order  to  quickly  notify  the  user  of  this  condition, 
and  make  the  necessary  repair,  a "built-in-test"  (BIT)  mechanism  operates 
concurrently  with  the  unit. 

The  BIT  equipment  might  take  the  fern  of  an  additional  I C component 
performing  modulo  addition  on  certain  bit  positions  of  the  operands.  It 
then  would  compare  its  result  to  certain  bit  position  values  of  the  full- 
adder's  result.  Alternatively,  the  BIT  equipment  might  be  a software 
routine. 

In  any  case,  the  significant  characteristic  here  is  that  detection 
is  performed  by  a "checker"  that  is  on-line.  The  advantages  of  having  such 
detection  capability  are  obvious,  ana  have  been  studied:  "The  purpose  of 
associating  a detector  with  a module  is  for  reducing  the  propagation  and 
contamination  of  errors  and  also  for  easing  the  maintenance,  sinew  the 
faulty  modules  are  located  automatically  by  themselves.  Therefore  the 
repair  time  is  reduced."  [4] 

The  time  it  takes  the  checker  to  detect  a failure  is  termed  the 
"latency"  [83  . 

This  also  is  modeleld  with  an  exponential  distribution. 

(3)  - L(t)  « e“6t 
where, 

"L(t)"  is  the  probability  that  detection  occurs  after 
"t'(  units  of  time. 

"<5"  is  the  rate  of  detection  in  (DETECTIONS/HR.). 

The  reciprocal  of  the  detection  rate  is  the  average  time  it  takes 

to  detect  a failure.  This  quantity  is  generally  referred  to  as  the  "mean  ' 
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time  to  detect  failure"  (MTDF) . 


(4)  MTDF  = 1/6 

The  system  is  now  described  as  having  gone  from  the  "railure  state" 
(F)  to  the  "detected  state"  (D) . See  Fig  2. 


Fig  2 

Once  the  detected  state  is  reached,  the  necessary  repair  or  replace- 
ment is  done  to  return  the  system  to  the  working  state.  This  regained 
working  state  is  completely  comparable  to  the  previous  working  state.  For 
example,  if  replacement  were  made,  the  new  full-adder  chip  would  have  already 
undergone  burn-in  and  would  be  in  its  useful  life. 

The  repair  time  is  also  an  exponential  distribution. 

(5)  M(t)  = e"pt 
where, 

”M(t)"  is  the  probability  that  the  repair  is  made  after 
"t"  units  of  time. 

"p"  is  the  repair  rate  in  (REPAIRS/HR. ) . 

The  reciprocal  of  the  repair  rate  is  the  average  time  it  takes  to 
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make  a repair: 


the  "rr.ean  tine  to  repair"  (KTTR)  . 


(6)  MTTR  = 1/p 

The  system  is  nov;  depicted  in  Fig  3 [Sj  , 


The  final  event  to  be  considered  is  the  failure  of  the  checker. 
This  probability  is  given  by: 

(7)  C(t)  = e"at 

vhere, 

"C(t)"  is  the  probability  that  a checker  failure  occurs 
after  "t"  units  of  time. 

"a"  is  the  failure  rate  in  (FAILURES/HR.). 

The  time  to  repair  the  checker  is  modelled  by: 

(8)  B(t)  = e"et 

vhere , 

"B(t)"  is  the  probability  that  the  repair  is  made  after 

"t"  units  of  time. 

"0"  is  the  i>epair  crate  in  (REPATRS/HR. ) .,  of  tt\e  checker. 


The  complete  system  is  now  shown  in  Fig  4.  Q5  , 8j 


Fig  4 

After  the  system  model,  depicted  by  the  Markov  Chain  in  Fig  4, 
has  been  running  a long  time,  it  reaches  the  "steady  -state".  [ 6 J Once  the 
steady  state  is  reached,  there  is  very  little  change  in  its  behavior  from 
one  time  interval  to  the  next.  At  this  point,  the  probabilities  of  being 
in  a given  state  are  derivable.  The  probability  of  being  in  state  "W"  (?w> 
and  the  probability  of  being  in  state  "F"  (P_)  are  of  particular  interest 


y;  iacs 


(9)  Pw  1 T A/y  + a/6  + a/B 


(i°)  Pf  “ 1 + A/y  + A/6  + a/6 


The  "P  " is  called  the  "real  availability"  . This  is  the 

probability  that  the  system  is  producing  correct  results.  When  the  unit 
fails,  however,  there  is  a certain  latency  time  before  this  failure  is 
detected.  Before  detection  occurs,  the  user  assumes  the  system  is  available, 
unaware  that  it  is  in  the  failure  state.  For  this  reason,  the  probability 
of  being  in  state  "W"  or  state  "r"  is  called  the  "apparent  availability" 

(aappp)  '3 ’ ‘ 


<U>  l"PB 


(12>  VPR-Pa  + PF 


^ 3>  AAPPR  AR£AL  + ARR'-J. 


2.  The  Non-Maintained  System 

This  system  consists  of  a functional  unit  and  a checker.  There  is 
no  repair  done,  however,  when  a unit  failure  is  detected,  or  when  the 
checker  fails. 


1/A  = mean  time  to  unit  failure 

1/6  = mean  time  to  unit  failure  detection 


1/a  = mean  time  to  checker  failure 


Then,  the  "real  reliability"  of  the  system  is  given  by: 


(14)  R (t)  = e 


-(A+a)t 


The  "apparent  reliability"  of  the  system  is  given  by: 

A-12 


(15)  R (t)  = -r— r-  e-(X+a)t  - A e-(6+0t)t 
a o-A  o-A 


And  the  real  and  apparent  mean  times  to  failure  for  the  system: 


(16)  fITIFR-SJ 


(17> 


It  is  these  quantities  that  are  of  interest  in  the  rcn-maintained 


system. 
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III.  BEHAVIOR  OF 


SYSTEM 


Choosing  values  for  the  equations  of  interest  was  a problem  with 
no  clear  answers.  A snail  (eight  bit)  adder  car.  be  constructed  that  has 
a mean  time  to  failure  (MTTF)  of  10s  hours.  The  checker,  being  a smaller, 
less  complex  device,  can  have  an  MTTF  of  10s  hours.  If,  for  example,  a 
modulus  three  bit  checker  is  employed,  then  hardware  detection  can  be  done 
in  0.01  seconds.  Allow  15  minutes  to  repair  the  checker  and  20  minutes 
to  repair  the  unit. 

Given  these  parameter  values,  the  availabilities  for  the  maintained 
system,  to  seven  significant  digits,  are: 

Areal  = 0'9999947 

aappr=  °-5999947 

For  the  non-maintained  system: 

MTTF  = 90909.09  hrs. 

K 

MTTF.  = S0909.09  hrs. 

A 

If  the  detection  time  is  increased  tenfold  to  0.1  seconds,  still 
no  change  is  registered  within  this  precision  (see  Tables  1 and  2). 

For  this  reason,  double  precision  (two  full  words  of  memory)  was 
required  for  all  values.  For  producing  graphs,  extensive  scaling  had  to  be 
done.  In  general,  those  digits  that  remained  constant  for  both  the  real 
and  apparent  cases,  over  a range  of  detection  times,  were  subtracted  out. 

In  order  to  examine  how  availabilities  (maintained)  and  MTTF's 
(non-maintained)  depend  upon  latency,  five  detection  time  ranges  were  used. 


A-14 


TABLE  1 


*— < 

t—4 

»■  ^ 

9~J 

— j 

9—i 

0 

1 

o 

o 

f 

o 

■ 

0 

1 

0 

1 

0 

1 

0 

1 

0 

1 

0 

1 

1 

UJ 

1 

UJ 

1 

Ul 

i 

UJ 

1 

UJ 

I 

U • 

1 

LU 

1 

UJ 

1 

LU 

1 

LU 

f~- 

r- 

r- 

r- 

CO 

CO 

O 

CO 

(M 

CM 

CM 

<M 

CM 

CM 

CM 

CM 

m 

rr 

vO 

o 

>0 

O 

O 

O 

O 

o 

o 

v0 

in 

in 

in 

in 

in 

m 

in 

in 

in 

in 

n- 

c- 

r- 

h- 

h- 

r- 

r- 

r- 

r- 

CM 

<\J 

CM 

CM 

CM 

CM 

fM 

fM 

nj 

O 

o 

o 

O 

o 

O 

O 

O 

O 

o 

o 

o 

o 

o 

o 

O 

O 

O 

o 

O 

in 

in 

in 

m 

m 

m 

in 

in 

in 

in 

•— • 

r- 

h- 

h- 

r- 

r- 

r- 

r- 

r~ 

r- 

< 

Nj* 

'?■ 

c 

'T 

M- 

vT 

> 

o 

cr 

O' 

CT 

cr 

cr 

cr 

CT 

cr 

< 

O' 

O' 

O' 

cr 

0s 

cr 

O 

cr 

CT 

O' 

cr 

O' 

cr 

cr 

cr 

c> 

cr 

QC 

O' 

O' 

O' 

cr 

cr 

cr 

cr 

cr 

(X 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

£X 

O' 

CT 

O' 

cr 

cr 

cr 

cr 

cr 

cr 

cr 

< 

•— < 

r*»4 

rH 

o 

o 

0 

1 

o 

* 

0 

1 

o 

1 

o 

0 

1 

o 

■ 

c 

i 

UJ 

t 

UJ 

1 

UJ 

i 

UJ 

1 

UJ 

1 

UJ 

1 

UJ 

1 

UJ 

i 

UJ 

i 

UJ 

r— « 

r- 

NO 

CM 

m 

o 

o 

CM 

CM 

in 

r- 

r- 

r— < 

o 

CO 

o 

co 

r- 

in 

03 

m 

CO 

o 

in 

r- 

CO 

in 

o 

r- 

o 

vO 

in 

o 

—j 

m 

m 

c> 

r- 

CO 

cr 

vj- 

r- 

r- 

CM 

co 

r—4 

O' 

O' 

cr 

CO 

C3 

r- 

vO 

in 

CM 

CO 

O' 

O' 

O' 

CT 

O' 

cr 

cr 

cr 

cr 

CO 

O' 

O' 

O' 

cr 

O' 

cr 

cr 

cr 

cr 

cr 

_J 

«T 

vt 

M" 

*r 

vj* 

M- 

<r 

'T 

s* 

t — < 

r- 

r- 

r- 

r- 

r- 

r^ 

<1 

•J- 

'l- 

nT 

<}- 

M* 

vT 

nT 

vj- 

> 

c> 

C' 

O' 

cr 

o< 

cr 

cr 

ir 

cr 

cr 

< 

O' 

O' 

cr 

cr 

O' 

cr 

cr 

cr 

cr 

cr 

1 

O' 

O' 

O' 

cr 

O' 

cr 

o 

cr 

cr 

cr 

• 

• 

—I 

O' 

O' 

cr 

cr 

O' 

cr 

cr 

cr 

cr 

cr 

to 

co 

O 

<t 

n 

• 

• 

« 

• 

• 

« 

• 

• 

• 

CC 

cc 

• 

UJ 

LU 

O' 

O' 

O' 

CT 

cr 

cr 

cr 

CT 

cr 

cr 

X 

X 

• 

«/> 

l/) 

a; 

l/J 

c: 

in 

o 

cc 

X 

+ 

X 

• 

vO 

o 

O 

MD 

vO 

o 

>o 

vO 

in 

UJ 

UJ 

in 

o 

o 

o 

c 

o 

O 

o 

O 

o 

o 

o 

o 

o 

in 

CM 

1 

1 

1 

1 

1 

1 

1 

1 

1 

l 

• 

• 

• 

• 

A 

LU 

UJ 

UJ 

UJ 

UJ 

UJ 

UJ 

UJ 

UJ 

UJ 

rH 

O 

o 

1 

r- 

(M 

m 

o 

r- 

cr 

CO 

in 

o 

r- 

r- 

in 

in 

in 

r- 

M" 

NO 

o 

CM 

o 

II 

II 

II 

II 

• 

n- 

o 

r^- 

r- 

r*- 

O 

r-4 

in 

cr 

in 

O 

Is- 

m 

co 

r- 

in 

in 

r- 

m 

cc 

o' 

LU 

t- 

o 

m 

in 

r^- 

o 

O 

o 

m 

cr 

UJ 

UJ 

UJ 

UJ 

1/1 

f- 

in 

m 

o 

r- 

in 

cr 

in 

o 

o 

Y 

o 

>c 

r~ 

(M 

in 

CO 

r^- 

o 

CM 

r- 

o 

M* 

o 

•— . 

o 

rJ 

i'- 

in 

r- 

r> 

p- 

in 

o 

CM 

O 

> 

UJ 

> 

UJ 

o 

r- 

o 

CO 

u 

r- 

o 

m 

in 

cr 

ro 

UJ 

X 

LL« 

X 

• 

r- 

n 

m 

in 

r- 

in 

r- 

n- 

CO 

o 

o 

o 

o 

o 

n- 

o 

m 

r- 

o 

M* 

o 

CO 

CO 

n- 

in 

in 

CM 

in 

NO 

in 

vO 

cr 

1 

1 

1 

1 

r- 

CM 

r- 

in 

o 

o 

co 

r- 

o 

r— 4 

«• 

r~ 

in 

CO 

o 

vT 

in 

o 

CM 

MD 

UL 

X 

rs: 

or. 

U- 

u. 

r- 

o 

m 

CO 

o 

o 

cr 

<* 

h- 

h- 

i— 

»- 

o 

CO 

• 

< 

• 

« 

• 

« 

• 

• 

t 

• 

►“ 

H- 

V-~ 

K— 

V— 

t~ 

(M 

m 

m 

n 

in 

<5 

r- 

cr 

rH 

ST 

X 

5" 

s" 

V 

TABLE  2 


I 


zy 

CT 

ZT 

zr 

zr 

— 

— 

ZT 

O 

O 

o 

o 

o 

o 

o 

O 

O 

o 

•1- 

S' 

•4* 

s- 

s- 

s- 

T- 

s 

S-3 

M 

pi 

to 

ro 

u 

<N 

CO 

ZT 

CN 

r~ 

r? 

vC 

o* 

ro 

r> 

T— 

CO 

zr 

* — 

m 

(N 

in 

CO 

VC 

o 

ID 

iTi 

O 

O 

vO 

CN 

cr 

in 

C' 

f — 

r- 

o 

r- 

in 

^o 

CO 

CJ 

r- 

c 

CD 

VO 

CO 

in 

o 

VO 

m 

CN 

* — 

ro 

t~ 

1 — 

rs» 

CN 

ro 

ro 

lT> 

cc 

(N 

t— 

r— 

r— 

t — 

r* 

«— 

r - 

r- 

<N 

C» 

O 

cr> 

O 

c* 

CN 

CN 

cr. 

O' 

C* 

• 

o 

c 

o 

o 

o 

O 

o 

C3 

c* 

O 

O' 

o 

o> 

cr* 

r-( 

CN 

c* 

C» 

c. 

C' 

G 

o 

o 

o 

O 

O 

o 

o 

/ — . 

CO 

o 

G 

cr. 

cr 

cr 

cn 

C' 

Cr- 

C '• 

CN 

• 

CTn 

< 

o 

o 

o 

o 

c 

o 

o 

c 

o 

cr- 

o 

a*. 

cr. 

C' 

CN 

c 

O'- 

c* 

C' 

Ph 

o 

o 

o 

o 

o 

o 

w 

o 

o 

o 

H 

• 

4 

• 

• 

• 

4 

• 

• 

• 

4 

H 

CP 

cn 

cr* 

cr. 

CN 

CN 

CP 

CP 

C*N 

CN 

r 

.=? 

— ♦ • 

zf 

o 

o 

o 

o 

o 

o 

o 

o 

o 

O 

s- 

•4- 

+ 

S' 

S' 

S' 

s- 

S' 

s- 

M 

pi 

w 

pi 

pi 

PI 

PI 

PI 

pi 

PI 

ON 

Cn 

cr* 

ON 

cr 

o- 

tr< 

O* 

0- 

c. 

CD 

CO 

CO 

CO 

cc 

c:- 

CO 

cc 

cc 

C3 

O 

o 

o 

o 

c~? 

o 

o 

C' 

o 

o 

cr 

CP 

cr- 

C7* 

CN 

cr 

ON 

ON 

c 

O' 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o. 

O' 

O'. 

a- 

CN 

ON 

ON 

o. 

O' 

C’ 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

cn 

CP 

c> 

cr 

CN 

cr. 

O' 

CP 

o. 

o 

o 

o 

o 

o 

o 

o 

O 

o 

o 

o- 

0-> 

O* 

cr. 

cr 

O' 

CN 

o 

cr 

a. 

o 

o 

O 

o 

o 

o 

o 

o 

o 

o 

pi 

cn 

cn 

04 

ON 

ON 

CN 

CN 

CN 

O' 

ON 

K 

o 

o 

o 

o 

o 

o 

o 

o 

c* 

o 

cn 

CN 

cr. 

CN 

ON 

CN 

CN 

ON 

CN 

o- 

• 

• 

P, 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

in 

to 

u 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

cn 

cs 

w 

£-h 

cr 

on 

cr 

ON 

CN 

ON 

O' 

CN 

CN 

CN 

j* 

to 

E 

to 

r— 

1+ 

+ 

« 

VO 

vO 

VO 

VO 

VO 

O 

vo 

vo 

10 

in 

,w 

o 

w 

O 

o 

O 

c 

o 

o 

o 

o 

o 

c 

o 

o 

I 

1 

1 

• 

1 

1 

f 

f 

f 

i 

• 

• 

/s. 

ro 

w 

PI 

w 

PI 

■pi 

pi 

w 

PI 

pi 

r— 

T— 

1 

Cl 

ro 

o 

r~ 

On 

CO 

uo 

O 

fp 

r- 

m 

in 

UO 

r- 

VO 

o 

CN 

o 

ii 

li 

4 

r- 

o 

r- 

(" 

n- 

o 

r— 

m 

O. 

u* 

u 

r- 

ro 

CO 

t — 

r- 

uO 

in 

r- 

CT 

ro 

cc 

PI 

f" 

o 

ro 

in 

r- 

o 

cr 

o 

ro 

CN 

w 

w 

to 

r- 

in 

O') 

co 

r- 

tn 

CN 

m 

O 

o 

u 

ps 

t-> 

CN 

in 

03 

r- 

o 

CN 

r~ 

o 

CJ- 

M 

U 

T— 

r- 

uo 

r~ 

ro 

r- 

uo 

o 

rsi 

o 

> 

PI 

o 

r- 

o 

CO 

o 

r~ 

o 

ro 

m 

cr 

ro 

W 

K 

• 

r~ 

ro 

ro 

U) 

r- 

in 

r- 

r-~ 

z? 

CD 

G 

U 

o 

r 

O 

ro 

r- 

n- 

o 

rr 

o 

ro 

CO 

r- 

in 

uo 

% — 

(N 

m 

O 

UN 

O 

CN 

1 

1 

r~ 

CN 

f" 

in 

O 

o 

cc* 

o 

T— 

• • 

r~ 

in 

CO 

o 

■O 

in 

m 

o 

CN* 

vO 

G 

p. 

PH 

PH 

r- 

c 

O') 

CO 

ro 

o 

o 

in 

C« 

It-. 

H 

G 

a 

• 

• 

« 

« 

• 

« 

• 

• 

• 

• 

1 < 

H 

H 

CN 

CO 

m 

O’) 

uo 

UN 

cr. 

\yz 

22 

C 

A- 1 6 


MTDF  = 0.01  -*•  0.1  seconds 


0.1  -*■  1.0  seconds 

1.0  10.  seconds 

1.0  -*•  10.  minutes 

10.  20  minutes 


In  order 

to  extend  the 

analysis  to 

more 

complex  units,  additional 

parameter  values 

were  used.  In 

all,  three 

cases 

were  examined. 

CASE*. 

I 

II 

Ill 

MTTF- DEVICE : 

10s 

10u 

103 

MTTF- CHECKER: 

10* 

105 

lO* 

MTTR-DEVICE: 

0.5 

1.0 

2.0 

MTTR- CHECKER: 

0.25 

0.5 

1.0 

The  MTTR' s apply  only  to  the  maintained  system.  All  values  are 
given  in  hours.  In  accordance  with  the  checker  being  a less  complex 
mechanism  than  the  unit  device,  it  is  assumed  that  the  checker  takes  longer 
to  fail  and  quicker  to  repair.  Each  of  these  three  cases  was  examined 
over  the  five  specified  latency  time  ranges. 

1.  The  Maintained  Svstem 

See  Graph  2.  As  indicated  in  the  graph  labels,  this  depicts  the 
real  and  apparent  availabilities  over  the  first  three  latency  ranges. 

The  straight  line  at  the  top  (1.0  x 10-8)  represents  the  apparent 
availability.  Its  horizontal  character  indicates  that  this  value  undergoes 
no  change  relative  to  the  real  availabilities.  Of  course,  as  the  latency 
increases,  the  apparent  availability  also  increases  since  the  user  is 


unaware  of  a failure  for  a longer  time. 
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This  increase  can  be  seen  in  Table  1,  and  takes  place  in  the  10“13,  10“ 
and  10" 15  decimal  places. 

The  almost  straight  line  directly  below  the  apparent  availability 
is  the  real  availability  for  MTDF  from  0.01  to  0.1  seconds.  The  system 
spends  the  vast  majority  of  its  time  in  the  working  state.  When  a unit 
failure  does  occur,  it  is  detected  very  quickly,  repaired,  and  returns 
to  the  working  state.  Due  to  this  very  small  detection  latency,  there  is 
little  difference  between  the  real  and  apparent  availabilities. 

The  next  curve  below  is  the  real  availability  for  MTDF  from  0.1  to 
1.0  seconds.  When  the  unit  does  suffer  a failure  it  now  takes  longer  to 
detect,  thus  delaying  the  return  to  the  working  state.  Therefore  the  real 
availability  is  decreased,  as  indicated  by  the  wider  gap  from  the  apparent 
availability. 

The  bottom  curve  is  the  real  availability  for  MTDF  from  IrO  to 
10.  seconds.  The  increased  latency  is  reflected  by  a greatly  decreased 
real  availability.  Furthermore,  the  system's  availability  is  now  sensitive 
to  small  relative  changes  in  the  detection  time.  This  is  reflected  by  the 
steep  descending  slope  of  the  real  availability. 

A similar  pattern  is  seen  for  the  remaining  two  MTDF  time  ranges 
in  Graph  3.  Again,  the  apparent  availability  shows  no  relative  change.:  The 
real  availabilities  show  sensitivity  to  latency  increases. 

An  identical  picture  is  produced  when  the  MTTF's  are  decreased, 
and  the  MTTR’ s are  increased  (Cases  II  and  III).  The  only  difference  for 
these  cases  of  increased  system  complexity  is  that  the  absolute  availability 
values  are  decreased. 

Appendix  I presents  the  graphs  and  tables  for  the  remaining  cases. 
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2.  The  Non-Maintained  System 


The  same  parameter  values  over  the  first  three  latency  time  ranges 
are  used  in  Graph  4 for  the  non-maintained  system.  The  bottom,  horizontal 
curve,  just  above  the  abscissa,  is  the  real  MTTF  of  the  system. 

Unlike  the  maintained  system,  where  the  apparent  availability  was 
only  a relative  constant,  this  real  MTTF  is  an  absolute  constant.  As  shown 
in  equation  (16)  the  real  MTTF  depends  only  on  the  MTTF's  of  the  unit 
device  and  the  checker. 

Since  these  values  are  fixed  over  all  five  latency  ranges,  there  is 
no  change. 

The  almost  horizontal  line  directly  above  is  the  apparent  MTTF  for 
MTDF  from  0.01  to  0.1  seconds.  When  detection  is  done  very  quickly,  there 
is  little  difference  between  the  real  and  apparent  MTTF's.  ('When  detection 
is  immediate,  the  two  MTTF's  are  identical.)  This  first  apparent  MTTF, 
the"  is  relatively  close  to  the  immediate  detection  case. 

The  next  curve  above  is  the  apparent  MTTF  for  MTDF  from  0.1  to  1.0 
seconds.  This  increased  latency  makes  the  system  appear  to  be  working 
correctly  longer  than  it  really  is.  This  is  reflected  by  the  wider  gap 
from  the  real  MTTF. 

The  .top  curve  is  the  apparent  MTTF  for  MTDF  from  1.0  to  10.  seconds. 
The  increased  latency  is  reflected  by  the  increased  MTTF  values.  In  this 
latency  range,  the  apparent  MTTF  is  more  sensitive  to  changes  in  the  MTDF. 
Thus,  the  sharply  increased  slope  of  the  top  curve. 

The  values  and  graphs  of  the  remaining  cases  over  all  five  latency 
ranges  are  given  in  Appendix  II. 

A clear  picture  of  what  is  happening  can  be  obtained  by  examining 


depicts  such  a ratio  versus  the  rate  of  detection. 

Thus,  as  the  checker  becomes  more  powerful  (ie.  faster 
error  detection)  the  core  closely  the  apparent  KTTF 
approaches  the  real  KTTF.  The  limiting  case,  of  course, 
is  the  infinitely  powerful  checker.  Then,  error 
detection  is  immediate,  and  the  two  FITTF's  are 
equivalent. 
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IV.  ANALYSIS  OF  THE  CHECKER 

In  a private  communication,  various  detection  rates  ("6")  for 
corresponding  values  of  checker  moduli,  M,  were  provided.  [6] 

6(secs  M 


0.66x10 


0.80x10 


0.86x10 


0.91x10 


0.92x10 


For  example,  a system  with  a unit  MTTF  of  10  hours,  and  checker 
MTTF  of  10^  hours:  if  a modulus  three  addition  checker  were  employed,  then 
the  rate  of  detection  would  be  0.66x10^  sec  \ 

It  was  desired  to  model  these  experimental  data  points  by  a curve 
of  the  form: 


(18)  6 = SoM3 


In  order  to  fit  the  data  to  the  curve,  it  was  necessary  to  transform 
this  non-linear  equation  into  a linear  form. 

By  taking  the  logarithm  (natural)  of  equation  (18) : 

(19)  ln(<5)  = ln^oM3) 

(20)  ln(<5)  = ln(Ma)  + ln(60) 

(21)  In (6)  = a ln(M)  + ln(60) 


By  this  means,  a linear  regression  could  be  performed  using  the 
logarithms  of  "6"  and  "M": 


i 


ln(5) 


In  (M) 


13. AO  1.10 

13.59  1.61 

13.66  1.95 

13.72  2.40 

13.73  2.56 

A least  squares  regression  on  these  date  produced  the  following 
equation: 

(22)  6 = (0. 22)M  + 13.19 

Taking  the  anti-log  of  each  side  produces  the  equation  of  the  desired  form 

(23)  6 = (13.19)M0,22 

To  determine  the  goodness  of  the  fit,  the  given  experimental  values 
for  "6"  were  compared  to  those  values  produced  by  equation  (23). 


M <$- EXPERIMENTAL  6-MODEL 

3 0.66  0.69 

5 0.80  0.77 

7 0.86  0.83 

11.  0.91  0.92 

13  0.92  0.95 


A chi-squared  test,  with  four  degrees  of  freedom,  yielded  a value  of  0.4622. 
Upon  table  look-up,  this  showed  a level  of  confidence  cofJ97.5Z..  Thus,  the 
®odel  provided  an  extremely  good  fit  of  the  data  points. 

The  larger  the  modulus  "M"  of  the  detector,  the  more  powerful  a 
checker  it  will  be.  It  will  also  mean  a more  expensive  checker.  Thus, 
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there  is  a trade  off.  It  is  desirable  to  have  a powerful  checker,  capable 
of  quickly  detecting  a large  class  of  faults.  It  is  not  desirable,  however, 
to  have  a needlessly  large  checker  that  incurs  excessive  cost. 

In  determining  the  size  required  of  the  checker,  the  importance  of 
avoiding  the  undetected  failure  state  must  be  considered.  If  it  is  extremely 
costly  to  the  users  to  be  in  the  failure  state,  then  this  implies  that  it 
is  worth  the  extra  cost  of  having  a powerful  detector.  If  the  failure  state 
can  be  tolerated  for  a longer  period  of  time,  then  a smaller  modulus  checker 
can  be  used. 

In  general, 

cost  of  checker  = C0  log(M) 

= a measure  of  the  cost  of  being  in  the  failure  state. 

r 

Then,  the  optimum  modulus  "M"  can  be  related  to  these  cost  factors 
as  follows:  [5] 


(24) 


M-opt.  = (— 
Co 


la,~r 

<5</ 


where  "a”,  and  "<S0"  are  from  equations  (18)  and  (23). 

This  equation  determines  the  optimum  modulus  for  a given  "cost  ratio". 

(The  cost  ratio  being  K^/Co  - the  cost  of  the  failure  state  to  the  cost 
of  the -detector.) 

The  value  of  "M-opt"  as  a function  of  the  cost  ratio  is  shown  in 
Graph  6.  As  expected,  the  larger  the  cost  ratio  (i.e.,  the  more  intolerable 
the  failure  state)  the  larger  the  required  modulus  of  the .detector. 
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V.  SIMULATION  OF  THE  MODELS 


1.  Looking  for  the  Steady  State 

The  first  requirement  of  the  simulation  is  to  determine  how  long 
it  takes  to  reach  the  steady  state.  The  initial  step  is  to  generate  random 
variates  to  determine  the  time  to  failure  for  the  unit  and  the  time  to 
failure  for  the  checker.  Whichever  is  the  smaller  value  determines  which 
of  these  two  events  occurs  first.  The  smaller  time  is  kept  and  the  larger 
one  is  thrown  away.  A counter  is  also  incremented  by  one.  This  keeps 
account  of  how  many  cycles  are  executed. 

a.  The  Working  State 

Consider,  for  example,  that  the  unit  fails  first  (which,  on  the 
average,  is  the  case  since  its  MTTF  is  smaller  than  that  of  the  checker) . 

Then  two  time  accumulators  must  be  incremented  by  this  amount.  First,  a 
master  clock  has  the  time  to  unit  failure  added  to  it  to  mark  the-progression 
of  real  time.  Next,  a special  clock  for  the  working  state  ("W")  is  also 
incremented  by  this  amount.  This  timer  keeps  account  of  the  total  time  the 
system  spends  in  the  working  state. 

b.  The  Failure  State. 

Next,  another  random  variate  is  generated — this  one  for  the  time 
to  detection.  Again,  the  master  clock  is  incremented.  Then,  another  special 
clock,  one  that  keeps  account  of  the  time  spent  in  the  failure  state  ("F") 
is  also  incremented  by  this  amount. 

c.  The  Detected-Repair  State. 

Finally,  a random  variate  is  generated  that  determines  the  time 
for  repair.  The  master  clock,  as  always,  is  incremented  by  this  amount. 

A special  clock  that  keeps  account  of  time  spent  in  the  repair  state  ("D") 
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is  then  incremented. 

With  repair  time  now  accounted  for,  the  system  is  back  in  the 
working  state.  The  cycle  is  completed.  The  execution  of  such,  a sequence 
results  in  PRINT-OUT- 1 being  produced. 

The  cycle  is  repeated  by  again  generating  random  variates  for 
checker  failure  time  and  unit  failure  time,  and  incrementing  the  counter 
by  one.  If,  for  example,  the  checker  fails  first,  a similar  sequence  occurs. 
First,  the  master  clock  and  the  working  state  clock  are  both  incremented 
by  this  amount. 

Next,  a random  variate  is  generated  to  give  the  time  fcr  checker 
repair.  The  master  clock  and -the  "C"  state  clock  are  both  incremented. 

The  cycle  is  completed  and  the  system  is  once  again  back  in  the  working 
state.  The  checker-failure-first  sequence  produces  PRINT-OUT-2. 

After  ten  executions  of  these  cycles,  a test  is  made  to  determine 
if  the  steady  state  has  been  reached.  For  the  maintained  system,  this  test 
is  done  in  the  following  manner. 

The  real  availability  over  the  first  five  cycles  is  compared  to 
the  real  availability  over  the  first  ten  cycles.  The  apparent  availabilities 
are  similarly  compared.  If  the  (real/appnrent)  availability  during  the 
first  five  cycle  executions  is  "reasonably"  close  to  the  (real/apparent) 
availability  during  the  first  ten  cycles,  then  there  has  been  little  change 
in  the  system  during  this  interval.  Thus,  the  steady  state  has  been  reached. 
This  would  produce  PRINT-OUT-3. 

If  either  the  real  or  apparent  availabilities  are  not  close  over 
this  time  interval,  then  a message  indicating  this  fact  is  printed,  and 


Program  execution  halts. 
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THE  STEADY  STATE  HAS  BEEN  REACHED 


IHIS  PAGE  IS  BEST  QUALITY  PRACTICABLE 
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The  manner  in  which  the  real  and  apparent  availabilities  are 
calculated  is  done  as  follows.  After  five  cycles,  the  availabilities  are 
determined  by: 

= FORKING  STATE  CLOCK 
AREA1:5  MASTER  CLOCK 

= (WORKING  STATE  CLOCK)  -r  (FAILURE  STATE  CLOCK) 

(26)  AAppR;5  MASTER  CLOCK 


After  ten  cycles,  anc^  A.-^pp.--  £re  calculated  in  the 

sane  manner.  After  ten  cycles,  all  the  clocks  will  have  larger  values 
than  after  five  cycles,  but  it  is  hoped  that  the  availabilities  have 
remained  fairly  constant. 


There  is  no  steady  state  for  the  non -maintained  system 
since  the  system  goes  down  after  each  failure  is  detected.  The 

equations  of  interest  are  the  real  and  apparent  HTTP. 

The  MTTF's  are  calculated  as  follows. 


(27)  MTTF 

K 

(28)  MTTFa 

A 


WORKING  STATE  CLOCK 
COUNTER 

(WORKING  STATE  CLOCK)  4 (FAILURE  STATE  CLOCK) 

COUNTER 


2.  Re-attaining  the  Steady  State 

All  the  clocks,  counters,  and  various  statistical  accumulators  are 
re-set  to  zero.  The  system  then  executes  ten  cycles  in  order  to  re-attain 
the  steady  state.  During  this  period,  as  before,  a print-out  is  generated 
for  each  cycle.  Print-out-4  is  an  example. 
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3.  The  Simulation 


After  ten  executions  of  the  cycle,  the  steady  state  is  re— attained. 
The  simulation  consists  of  thirty  additional  cycle  executions.  All  clocks 
and  accumulators  are  kept  accurate. 

After  thirty  cycles,  a listing  is  produced  that  shows  the  model's 
values  of  the  availabilities  or  MTTF's  according  to  equations  (11),  (12)  or 
(16),  (17),  and  simulated  values  using  equations  (25),  (26)  or  (27),  (28). 

PRINT-OUT-6  is  an  example  of  the  final  statistics  produced  for 
the  simulation  of  the  maintained  system.  FRIN7-GUT-7  is  an  example  for 
the  non-naintained  system. 

The  complete  program  listing  is  given  in  Appendix  III,  including 
results  from  additional  simulation  runs. 

4.  Evaluation  of  the  Simulation 

As  seen  in  PRINT-OUT-6,  the  model  and  the  simulation  availabilities 
agree  to  five  decimal  places.  Furthermore,  the  variances  for  the  simulation 
values  are  of  the  order  of  10  ^ . This  means  that  the  availability  values 
are  a very  good  indication  of  the  state  of  the  system  during  any  particular 
time  interval. 

For  the  non-maintained  system,  the  variances  are  close  to  the 
2 

theoretic  value  of  (1/1)  . Thus,  the  MTTF's  give  only  a very  general 
indication  of  how  long  the  system  is  in  the  working  state.  There  is  a 
wide  variation  of  values  for  any  particular  cycle. 

5.  Various  Mechanisms  of  the  Simulation 

In  presenting  the  simulation,’  a number  of  fine  points  were  purpose- 
fully neglected  in  order  that  the  main  aspects  of  the  program  design  would 


not  be'- obscured . 
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SYSTEM  parameters 
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One  of  these  neglected  areas  concerns  the  method  of  generating 
the  random  variates  that  determine  the  times  of  failure,  detection  and. 


for  the  maintained  case,  repair.  The  first  step  is  to  generate  a uniform 
random  number  between  0.0  and  L.  Q This  was  done  by  means  of  a built-in 
procedure  call  available  in  PL/I.  This  procedure,  "VARGEN",  must  be  passed 
parameters  that  indicate  the  type  of  distribution,  and  the  range  of  the 
random  numbers  desired  [7]. 

Consider,  for  example,  calculating  the  time  to  unit  failure.  The 
probability  of  such  a failure  in  less  than  "t"  time  units  is  given  by: 

(29)  Pr (Tj<t)  = 1 - e_At 

Then,  the  reliability,  "R",  of  the  unit: 

(30)  R = e_Xt 

(31)  In  R = ln(e"At) 

(32)  In  R = -At 

(33)  t = — (l/A)ln(R) 

In  order  to  calculate  the  time  to  detection  or  repair,  the  random 
number,  is  factored  by  -(1/6)  or  -(1/y),  respectively. 

Another  aspect  that  was~glossed  over  concerned  the  test  made  to 
determine  if  the  steady  state  had  been  reached.  The  strategy  used  here 
was  that  the  means  justify  the  ends. 

Consider,  for  example,  the  maintained  system.  Assume  that  the 

original -test,  for  the  real  availability  was  I ^‘^TlEAX, * 5 " ^REAL'10^  < 

v -A 

tor  most  runs,  this  test  would  not  succeed.  In  fact,  even  10  would  often 

fail.  So,  a difference  of  10  would  be  tried.  If  this  worked,  the 
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A HIGH  LZ7EL  DIGITAL  COMPUTER  SIMULATOR 
WITH  FADL7  INJECTION  FACILITIES 


1-  INTRODUCTION 

A najor  consideration  in  the  design  and  devalcpnant  of  a 
coaputer  systen  today  is. the  reliability  of  that  systen.  The 
need  for  high  reliability  in  aerospace  ccaputers  and  conputers 
used  in  nilitary  applications  is  obvious.  This  need  is 
expanding  to  enccapass  aany  other  areas  such  as  banking,  stock 
exchanges,  and  c caaunications,  and  will  doubtless  continue  to 
be  iaportant  as  the  range  of  uses  of  coaputer  systers 
increases. 


The  National  Aeronautics  and  Space  Adninistr ation  defines 
reliability  as  "the  probability  of  a device  perforning 
adequately  for  the  period  of  tine  intended  under  the  operating 
conditions  encountered."  ?vo  aajor  approaches  to  the  study  of 
reliability  today  are  (1)  the  analytical  nodel,  and  (2)  the 
consideration  of  certain  real  hardware  organisations  such  as 
the  Jet  propulsion  Laboratories  STAR  (Self-Testinc  and 
Repairing)  coaputer. 


of 


The  analytical  nodel  requires  the  analytical 
nodules  and  thus  the  systen  collectively. 


^ ^ c ^ — * 
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Heal  hardware  organizations,  unlike  analytical  rodels. 


facilitate  perf or  nance 

evaluation 

based  on 

the  processing  of 

typical  workloads  given. 

This  is 

an  eval 

uation  relatively 

siaple  coapared  to  that 

of  analytical  acaels 

. Eovever,  actual 

hardware  configurations 

do  not  allow 

the  functional  flexibility 

required  for  the  s 

tudy  of 

a vide 

range  of  systea 

configurations,  and  an 

alternative 

approach 

should  then  be 

considered. 

Ideally,  a systea  siaulator  capable  of  supporting  a vice 
range  of  systea  configurations  in  addition  to  allowing  the  use 
of  statistical  distributions  to  approxiaate  various  fault 
environaents  is  needed.  This  of  course  aust  be  accoaolished 
vith  ainiaal  cost.  The  solution  proposed  here  is  a siaulator 
which  coabines  the  functional  flexibility  of  a sinulated 
hardware  organization  vith  the  ability  to  process  a typical 
workload  on  the  sinulated  aachine  subject  to  a specified  fault 
environment.  The  siaulator  is  capable  of  supporting  nary 
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2.  03 JZC7I72  S 


There  were  many  considerations  in  the  development  of  the 
siaulator.  As  discussed  earlier,  a large  nuaber  of  system 
configurations  should  be  supported  to  allow  the  user 
flexibility.  The  siaulator  should  also  support  a wide  range  of 
functions,  in  particular,  functions  which  would  allow  the 
siaulation  of  faults  occurring  in  real  systems.  The  ease  of 
use  was  also  a major  considera tion- 

flore  specif icallv , the  siaulator  proposed  here  allows  the 
user  to  accomplish  the  following: 

(1)  to  configure  .the  systea(s)  under  investigation, 

(2)  to  generate  a fault  environment  and  distribute 
faults  over  a specified  nission  tine  using 
appropriate  statistical  distributions, 

(3)  to  prcgraa  fault-detection  capabilities  to  study 
aanif esta tions  of  faults, 

(h)  to  prcgraa  fault- correction  capabilities,  and 

(5)  to  observe  the  perforaance  of  systen  (s)  with 
regard  to  a typical  workload  when  such 
systeas  are  subject  to  various  fault 
en7ir cnments. 


Traditionally,  sinulators  have  been  sonewhat  restrictive 


the  support  of  a wide  range  of  systems.  This  simulator 
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resolves  this  problem  by  providing  basic  systea  building  blocks 
which  allow  the  user  to  create  a systea.  Also  provided  are 
special  fault- processing  instructions  which  allow  the  injection 
of  faults  at  will  in  the  created  systea.  Additionally,  an 
instruction  set  is  provided  vhich  allows  the  user  to  program 
the  systea  to  detect  and  recover  from  injected  faults  in  a 
prescribed  aanner. 

The  user  aay  devise  a typical  workload  suitable  for 
processing  on  the  simulated  systea  and  note  the  perforaance  of 
the  systea  subject  to  various  fault  env ironaents.  In  general, 
parameters  nay  be  inserted  into  the  systea  accounting  for  the 
system  configuration,  the  fa  alt  environment,  the  distribution 
of  faults  over  tine,  and  the  fault-detection  and  fault- 
correction  algorithas.  To  accoaplish  the  above  with  ease, 
instructions  provided  are  siailar  to  these  of  a siaple  asseably 
language.  The  user  is,  therefore,  required  to  spend  a minimal 
amount  of  time  learning  to  program  the  svstem- 
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3. 


SIMULATOR  DESCRIPTION 

A fault  aay  be  defined  as  "any  change  in  a systea  which 
causes  it  to  beha7e  d iff  errantly  fran  the  crtriginal  systea"  (3)  . 
These  nay  he  classified  as  logical  or  paraaetric.  A logical 
fault  is  one  which  causes  the  logic  faction  of  a circuit 
elenent  (or  elenents)  of  an  input  signal  to  be  changed  to  sene 
other  signal  (3).  Ixaaples  of  logical  faults  are  stuck-at-one 
(s-a-1)  faults  where  a circuit  signal  becones  stuck  at  the 
logical  value  of  1,  and  stuck-at-nero  (s-a-O)  faults  where  a 
circuit  signal  becoaes  stuck  at  the  logical  7alue  0. 
Paranetric  faults  frequently  alter  the  aagnitude  of  a circuit 
paranetar  causing  a change  in  sone  factor  such  as  circuit 
speed,  current,  cr  voltage  (3).  This  sinulator  is  concerned 
with  the  sinulation  of  logical  faults. 

Logical  faults  aay  be  further  classified  as  peraaner.t  and 
transient.  The  stnek-at  faults  aentioned  above  are  examples  of 
cernanant  faults,  i.e.  once  the  fault  occurs  it  pernanently 
becoaes  a part  of  the  systea  unless  corrected.  A transient 
fault  exists  over  a given  tiae  after  which  the  faulty  signal 
reverts  to  the  fault-free  node.  These  nay  have  nany  causes, 
for  exanple,  a change  in  environaent  (tea peratur e , hunidity) , 
cr  a change  in  systea  load.  Transient  faults  have  been  found 
to  be  responsible  for  up  to  35*  of  faults  in  systens  today 
( '0)  . 
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2.1  Development 


Throughout  the  development  of  the  simulator  the  concepts 
of  wide  applicability,  functional  capability,  and  ease  of  use 
were  eaphasized- 

With  these  considerations  in  aind,  the  first  design  was 
proposed.  The  user  was'  allowed  total  freedom  in  creating  a 
system  by  defining  basic  units  and  then  specifying  a 
configuration  of  these  units  into  a systea.  unit 
specifications  included  field  sizes  (i. e. , number  of  bits), 
field  types,  (i. e«  whether  the  fields  were  input  fields  or 

output  fields  or  both)  , the  nuaber  of  fields  within  the  units, 

* 

and  the  aeans  of  transmission  (serially  or  in  parallel)  into 
and  out  of  the  fields  of  the  unit. 

Inherent  in  this  design  were  aany  problems  for  both  the 
user  and  the  iaplemenror.  Because  the  units  were  defined  at  a 
very  low  level,  na ny  specifications  were  required  that  had  not 
been  considered  initially.  If  a unit  contained  aore  than  one 
inpat  field  and  one  output  field,  it  was  necessary  to  specify 
how  the  fields  within  a unit  were  connected,  i. e. , how  fields 
were  connected  to  each  other.  These  specif ications  also 
presented  inconveniences  in  the  programming  of  the  created 
systea  in  that  the  user  was  required  to  specify  aany  operations 
in  order  to  perfora  7ery  simple  tasks. 
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Additionally,  the  implementation  of  this  scheme  posed  many 
problems.  It  was  7ery  difficult  to  create  a sysrem  which 
entailed  a variable  nnnber  of  unit  types,  a variable  number  of 
units  within  type,  a variable  nuaber  of  fields  within  units, 
three  possible  types  of  unit  fields  (input,  output,  or  both), 
and  also  a variable  nuaber  of  hits  within  fields. 
Specif ications  for  all  interconnections  between  fields  within 
units  and  between  fields  of  different  units  were  also  required. 
This  was  very  cumbersome. 

Hany  checks  were  also  necessary  when  data  was  moved 
between  fields.  It  was  necessary  to  check  whether  the  fields 
were  of  the  saae  size,  whether  data  transaission  between  fields 
was  compatible  (serially  or  in  parallel) , and  whether  data  was 
being  moved  illegally,  e.g.,  data  was  being  transmitted  from  an 
input  field  of  one  unit  into  an  output  field  of  another  unit. 

The  generation  of  understandable  object  code  was  also 
nearly  iapossible.  Since  word  sizes  in  the  units  (including 
neaory)  were  variable,  the  formatting  of  object  code  to  be 
loaded  into  memory  was  difficult,  and  any  sort  of  addressing 
scheme  became  unnecessarily  complex  to  implement. 

Thus,  for  the  benefit  of  both  the  user  and  the 
implementor,  a nev  design  approach  was  considered. 
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In  the  scheme  which  was  finally  adopted,  instead  of  the 
user  defining  units,  a resource  pool  with  si.t  basic  unit  types 
is  provided  from  which  a system  say  be  configured.  These  are 
arithmetic  logic  units  (ALU's)  , central  processor  units 
(CPU's)  , memory  units  consisting  of  up  to  4096  words  (2  **  12 
words)  , bus  units,  peripheral  device  units,  and  voter  switch 
discriminator  (7 3D)  units.  All  registers  within  the  units  are 
of  a fixed  length  of  thirty-two  bits  compared  to  the  variable 
nuaber  of  bits  in  the  previous  design.  A systea  can  be 
configured  using  these  basic  units  with  up  to  a total  of  123 
units  (provided  there  are  no  nencry  space  limitations  in  the 
host  machine).  For  further  documentation  see  Section  3-2. 

The  prcgaaaing  language  which  was  needed  to  perform 
operations  on  the  created  system  was  developed  with  the 
following  considerations.  in  order  to  facilitate  the  injection 
cf  faults  at  the  bit  level,  a low-level  language  was  considered 
desirable.  It  was  also  necessary  that  the  language  support  the 
aodeling  of  typical  logical  faults.  Additionally,  it  was 
important  that  the  user  be  able  to  learn  the  language  with  ease 
and  with  minimal  investment  of  time  and  effort. 

The  language  is  based  on  an  assembly-type  language  with 
special  instructions  added  to  facilitate  the  modeling  cf 
specific  faults.  oince  the  language  is  very  similar  to  an 
assembler  language,  the  programmer  should  have  no  difficulty 
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learning  its  use  quickly.  For  further  documentation  see 
Sect  ion  3.3. 

Thus,  the  siaulatcr  proposed  consists  of  three  major 
phases:  Phase-I,  the  definition  phase,  Phase-II,  the  cross- 
assembler  phase,  and  Phase-IIT,  the  execution  phase-  Like  most 
simulators  today,  it  is  an  event-driven  siaulator  rather  than  a 
c on piler-d riven  siaulator-  An  event  in  this  case  is  defined  as 
a change  in  value  cf  a signal  line- 


3.2  PHASI-I:  Definition  Phase 


The  objective  of  the  definition  phase  of  the  siaulator  is 
to  obtain  a description  of  a digital  system  from  the  user  and 
generate  appropriate  data  structures  to  be  used  by  Phase-II  and 
Phase- III  of  the  program.  As  discussed  previously,  the  user  is 
given  a set  of  functionally  pre-defined  units  with  which  the 
system  aust  be  constructed  (i. e.  arithmetic  logic  units, 
central  processing  units,  memories,  busses,  peripheral  devices, 
and  voter  switch  discriminator  units  or  7SD  units).  Each  of 
these  units  has  been  assigned  appropriate  input,  output,  and 
i/o  registers  which  are  classified  as  follows: 

TYPE  IA3IL 

1)  data  register  d,  dl,  d2,  etc. 
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2)  soda  register 

3)  status  register 
U)  address  register 
5)  general  purpose 


rl , r2,  r 3 etc. 


re  gist ers 


As  a note  of  clarification,  these  regis ter-types  can  serve  in 
either  an  input,  output,  or  i/o  capacity  with  respect  to  a 
particular  unit-type. 


3.2.1  Unit  structures 


The  basic  units  are  constructed  as  follows: 


irith.ust  ic  logic  a nit 


?I2LD 
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general  purpose  register  1 
general  purpose  register  2 
node  register 
data  register 


status  register 


General  purpose  registers  one  and  two  are  used  as  the  two 
input  registers,  and  the  data  register  is  used  as  the  output 
register. 
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data  register  d 


The  address  register  aay  hold  the 

address  of  the  word  to 

be  fetched  froa  aenory,  with 

the 

data  register  holding  the 

fetched  word. 

Central  Processor  Unit 

FIELD 

SYMBOL 

node  register 

a 

s-tatus  register 

s 

general  purpose  register 

ft 

1 

r 1 

general  purpgse  register 

2 

r 2 

general  purpose  register 

3 

r 3 

general  purpose  register 

4 

r4 

general  purpose  register 

5 

r5 

general  purpose  register 

5 

r6 

general  purpose  register 

7 

r7 

general  purpose  register 

9 

rS 

Eus  unit 

FIELD 

SYMBOL 

status  register 

s 

data  register 

d 

address  register 
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Voter  Switch  D iscr ininato r Unit 

FIELD  SYMBOL 

node  register  m 

data  register  1 dl 

data  register  2 d2 

data  register  3 d3 

data  register  & 

data  register  5 d5 

data  register  6 d6 

data  register  7 d7 

data  register  8 d8 

status  register  s 

data  register  d 

general  purpose  register  1 rl 

Peripherals 

FIELD  SYMBOL 

node  register  a 

data  register  1 dl 

status  register  s 

data  register  d 


The  status  registers  in  each  of  the  units  indicate  v her her 
the  unit  is  operational  cr  non-operat ional , and  can  be  set  or 
reset  by  the  user. 
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The  user  nay  utilize  as  nany  or  as  few  of  the  fields 
within  the  units  as  desired-  The  units  are  constructed  to 
serve  in  a very  general  purpose  nanner  in  order  to  allow  the 
simulation  of  as  many  systems  as  possible- 


3.2-2  System  Creation 

The  task  of  defining  a digital  system  is  now  one  of 
creating  the  units  and  interconnecting  the  unit  registers  or 
fields  of  the  units-  A high  level,  symbolic  language  has  been 
developed  which  enables  the  user  to  describe  these 
interconnections  to  the  sinulator. 


The  crea 

tion 

of  a 

unit 

is 

accomplished  by 

specifying 

general  fora 

"un 

i t-identifi 

er- 

naae. fieldname" , 

where  the  f 

character  of 

the 

f ield 

name 

indicates  the  unit 

type  ("a" 

alu,  "a"  for  neaory,  and  so  forth),  and  the  remainder  of  the 
field  is  the  syabol  of  one  of  the  designated  fields  of  the  unit 
(e.g.  dl,  rl,  s)  . Thus  "Alul.arl"  creates  an  arithmetic  logic 
unit  with  the.  identifying  naae  "Alul". 

?ield  connections  are  specified  easily  by  using  ">" 
between  two  field  names.  This  indicates  that  the  contents  of 
the  field  on  the  left  of  the  may  be  transmitted  to  the 

field  on  the  right  of  the  ”>".  For  example,  "ilu'i.ad  > 
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Vsdl.vrl"  creates  two  units,  an  arithmetic  logic  unit  with 
identifying  r.ane  "Alul”  and  a 7oter  switch  discriminator  unit 
with  identifying  name  "7sd1”.  The  statement  also  specifies 
that  the  contents  of  the  data  register  of  "Alai"  nay  be 
transmitted  to  general  purpose  register  1 of  "7sd1".  Thus,  an 
entire  systea  aay  be  created  using  similar  specif ications. 
(Formal  language  constructs  of  the  above  aay  be  found  in 
Section  d.1) 

3.2.3  Program 


The  definition  phase  of  the  simulator  is  an  interpreter 
whose  inputs  are  the  coaaaad  lines  described  above  and  whose 
outputs  are  a series  of  tables  which  describe  the  defined 
systea.  This  language  was  developed  so  that,  in  a single  pass, 
the  input  file  is  scanned  character  by  character  with  no  look- 
ahead or  look-behind  required. 

T very  occurrence  of  a properly  constructed  identifier  in 
the  input  stream  causes  the  interpreter  to  search  for  the  label 
portion  of  that  identifier  to  determine  if  a similar  label  has 
teen  used  previously.  If  the  identifier  has  appeared  before, 
the  unit-types  of  the  new  identifier  and  the  old  one  are 
checked  for  consistency.  Otherwise  a new  unit  tagged  with  the 
specified  label  is  allotted  space.  Thus,  the  appearance  of  an 
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identifier  in  the  input  street  causes  the  associated  unit  to  be 
defined  or  checked  for  consistency.  This  completely  eliminates 
the  need  for  declaration  statements  in  the  language. 

The  interpreter  is  also  characterised  by  powerful  error- 
detecting capabilities.  Because  only  a single  pass  is  utilized 
and  because  of  the  scanning  techniques  employed,  the  exact 
location  of  an  error  is  always  known.  In  the  event  of  an 
error,  an  appropriate  error  message  is  printed  and  the  exact 
location  of  the  infraction  is  indicated.  In  order  to  prevent 
confusing  or  unrelated  error  messages,  the  remainder  of  the 
ccmaand  line  is  then  ignored. 


3.3  PHAS3-II:  Cress- Assembler 

One  of  the  unique  features  of  the  simulator  is  its  fault- 
injection  capabrnty.  This  capability  is  incorporated  in  the 
programming  language  designed  to  program  the  system  created  in 
Phase-I.  As  was  mentioned  previously,  thus  language  is  similar 
to  an  assembly  language  but  with  additional  instructions  added 
to  facilitate  the  modeling  of  typical  faults.  This  language 
will  henceforth  be  called  the  assembly  language  or  assembler. 
The  objective  of  the  cross-assembler  is  to  read  the  user's 
assembly  program,  decode  the  instructions,  generate  the  objecr 
deck,  and  load  it  into  a specified  memory. 
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3.3.1  Development  of  Instruction  Repertoire 


The  major  consideration  in  the  development  of  the 
programming  language  was  the  devising  of  an  instruction 
repertoire  which  would  support  the  modeling  of  as  many  logical 
faults  as  possible.  This  included  both  permanent  and  transient 
faults.  Ease  of  use  was  also  a significant  consideration. 


Since  the  occurrence  of  faults  is  manifested  at  the 
register  level,  in  order  to  aodel  these  faults  it  was  important 
that  the  user  be  able  to  easily  access  the  registers.  1 
register- level  language  was  thus  considered  desirable.  This 
immediately  indicated  an  asseably-type  language. 


An  asseably-type  language  very  nicely  supported  the 
injection  of  transient  faults  provided  the  faults  were  of  short 
duration.  The  modeling  of  permanent  faults  and  transient 
faults  cf  a very  long  duration  however  was  inconvenient.  The 
user  was  required  tc  repeat  the  injection  of  a transient  fault 
many  times  in  order  to  accomplish  these  effects. 


Instructions  were  then  devised  to  facilitate  the  modeling 
cf  permanent  faults.  Thus,  transient  faults  of  short  duration 
and  permanent  faults  were  accommodated.  To  model  transient 
faults  of  longer  duration,  additional  instructions  were  added 
to  allow  the  alteration  of  permanent  fault-in jecuicn 
instructions  executed  previously.  Thus,  assembly  type 
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One  additional  feature  that  wvas  desirable  to  incorporate 
in  the  siaulator  was  sone  aeans  of  indicating  a tine  when 
certain  faults  could  be  injected.  In  the  earlier  discussion, 
it  was  assuaed  that  the  user  would  write  a typical  workload 
prcgran  and  inject  faults  into  fields  that  were  previously 
referenced  in  the  pragma.  Since  such  prograas  could  get 
rather  lengthy,  it  was  unreasonable  to  expect  the  user  to 
calculate  the  position  in  a prograa  where  certain  faults  should 
be  injected  by  aanually  working  through  the  tiaing  of  the 
prograa  instructions.  Thus,  an  additional  "tiaing"  feature  was 
iaplemented  vhere  the  user  aay  specify  the  tiaes  specified 
faults  are  to  occur.  This  feature  is  extreaely  useful  in 
nodeling  specific  distribution  tines  of  faults  (e.g.  Poisson)  . 


2.3.2  Discussion  cf  Instructions 

There  are  two  basic  instruction  types  with  which  to 
prograa  the  systen  created  in  Phase-I,  "regular”  instructions 
and  "fault-injection1’  instructions.  Within  the  fault- in jecticn 
instructions  are  two  classes,  (1)  instructions  nearly  identical 
to  the  "regular"  instructions,  and  (2)  special  fault- in jecti/ 
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instructions. 


The  basic  fort 


for 


all 


the 


instructions  of 

asseabler  is  as  follows: 

label:  CP  operands 

A label  (optional)  aust  be  less  than  .or  equal  to  eight 
alphanuneric  characters  in  length  and  aust  be  inaediately 
followed  by  a colon.  "CP"  aust  be  a valid  instruction  aneacnic 
with  the  operands  being  interpreted  according  to  the  specific 
instruction.  pseudo  operations  are  also  supported  and  are 
specified  by  a "S"  as  the  first  character-  Fornal  presentation 
of  all  asseabler  instructions  is  found  in  Section  2- 

The  "regular"  instructions  are  essentially  equivalent  to 
an  assenblv  language.  There  are  si*  basic  ‘instruction  types: 
(t)  two-operand  (2) one-operand  (3)  branch  (b)  inaediate-operand 
(5)  unit-operand  and  (6)  other.  These  are  very  siailar  to 
typical  assembly  language  instructions  with  the  exception  of 
the  unit  operand  instructions.  Brief  discussion  of  these 
instructions  follows. 

The  two-operand  instructions  are  those  which  expect  two 
field  naaes  or  a field  nane  and  a label  as  operands.  As  an 
exaaple  "AD  Cpul.crl ,Cpu1.cr2"  will  add  the  contents  of 
registers  one  and  two  of  Cpul  and  place  the  result  in  register 
one  of  the  CPU.  An  "I"  (preceded  by  a conna)  nay  follow  the 
second  operand  tc  indicate  that  indirect  addressing  is  to  be 
used  in  obtaining  the  value  for  the  second  operand.  The  other 
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sane  f or aat 


with 


the 


two-operand  instructions  adhere  to  the 
exception  of  the  coapare  instruction  (C)  and  the  neaory 
reference  instructions  (L.l  and  3Th)  - The  coapare  instruction 
coapares  the  two  operand  values  and  sets  the  condition  cede 
register  accordingly.  The  load  register  froa  senary  and  store 
froa  register  to  aeaory  instructions  expect  the  second  operand 
to  be  a label  instead  of  a field  naae. 

The  one-operand  instructions  expect  a single  field  naae  as 
an  operand  and  the  field  is  changed  according  to  the 
instruction  specified.  For  exaaple,  ”N C?  Cpul.crl"  generates 
the  ones  coapleaent  of  the  original  field  value  in  Cpul.crl. 

The  iaaediate-operand  instructions  expect  two  operands, 
the  first  operand  being  a field  and  the  second  operand  being  an 
integer  value.  ,\s  an  exaaple,  "S3T  Cpal.crl-,39"  subtracts  the 
deciaal  value  39  froa  the  contents  cf  Cpul.crl.  The  other 
iaaediate  operand  instructions  adhere  to  this  foraat.  The 
coapare  instruction  again  sets  the  condition  code  register,  and 
the  shift  instructions  shift  the  specified  field  value  (right 
or  left  as  indicated  by  the  instruction)  according  to  the 
nuaber  specified  in  the  iaaediate  value. 

The  branch  instructions  include  all  branch  instructions 
plus  the  "2ST"  (return)  and  "HIT"  (halt)  instructions. 
Conditional  branches  have  a condition  code  nuaber  associated 
with  then  which  is  coa pared  to  the  condition  code  register.  If 

R-ra 

. _ '■■■■■— Jb,  


any  of  the  set  bits  natch#  the  designated  branch  is  executed. 
The  condition,  codes  are  indicated  as  follows  (leftnost  bit  is 
tit  0)  : 

Condition 


Condition 

Bit  nuaber  set 

Code  Nuaber 

equal 

31 

01 

less  than 

30 

02 

greater  than 

29 

04 

parity  (odd) 

23 

03 

parity  (even) 

27 

015 

To  branch  on  no  re  than  one  condition,  the  sun  of  the 
corresponding  condition  code  nuabers  will  generate  the  proper 
condition-  For  exanple,  "EC  04, label*"  will  branch  provided 
the  "greater  than"  bit  is  set.  To  branch  on  the  condition 
"greater  than  or  equal  to"  the  instruction  should  appear  as  "BC 
05,  label-!".  It  should  be  noted  that  the  condition  codes  are 
not  checked  by  the  siaulator  for  rational  usage;  therefore, 
unconditional  branches  using  conditional  branch  instructions 
are  possible.  For  exaople,  "EC  07,label1"  branches  on  the 
condition  "less  than,  equal  to,  or  greater  than",  i.  e.  branches 
unconditionally.  Such  condition  codes  should  be  avoided.  The 
branch  and  save  (ESA)  instruction  allows  branching  to 
subroutines  by  storing  the  prograa  counter  word  on  a stack. 
The  return  (HIT)  pops  the  stack  and  restores  the  procran 
counter  to  the  word  following  the  address  popped.  The  "HIT" 
instruction  effectively  stops  the  prcgrao  by  branching  to  the 
end  cf  the  prcgrao. 
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The  unit  operand  instructions  were  added  to  facilitate  the 


user  in  perfuming  certain  operations  i.n.7ol7ing  an  entire  unit. 
These  instructions  differ  frca  each  of  the  previous 
instructions  - in  that  a unit  nane  instead  of  a field  nane  or 
label  is  specified  as  an  operand.  Additionally,  the  various 
instructions  expect  the  unit  operands  to  be  of  a particular 
type.  For  all  of  the  arithmetic  unit  operations,  the  unit  type 
must  be  an  AID.  The  instruction  triggers  the  ALU  in  that  the 
tvo  inputs  are  operated  on  as  specified,  the  result  is  placed 
in  the  output  register  and  the  output  directed  as  specified  by 
the  connections  given  in  Phase-I.  The  clear  unit  (CLU) , dump 
unit  (UBOa?)  , and  SST  and  F.ST  (set  and  reset  status)  will 
accept  the  nane  of  a unit  of  any  unit  type  as  an  operand  and 
perform  the  necessary  operations.  The  memory  dump  (UDUh?) 
requires  the  nane  of  a nenory  unit  as  an  operand  and  prints  the 
contents  of  the  locations  specified  by  the  addresses  given  as 
additional  operands.  The  70TS  instruction  expects  the  nane  of 
a voter-svitch-discriainator  (7SD)  unit  as  an  operand  and 
triggers  7SD  units  in  the  sane  Banner  that  unit  arithmetic 
instructions  trigger  ALU’s. 

The  '’other'*  instructions  include  any  instructions  not 
covered  by  the  above,  e.g.  input/output.  The  format  and  use  is 
clearly  specified  in  Section  u.2. 
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This  concludes  the  presentation  of  the  "regular" 
instruction  types.  The  sinilarity  to  an  assembly  language  is 
thus  obvious.  The  following  discussion  details  the  fault- 
injection  instruct ions. 

As  was  aentioned  above,  there  are  two  classes  of  the 
fault-injection  instructions  (1) those  sinilar  to  the  "regular" 
instructions  and  (2)  the  special  fault- injection  instructions. 
?ault-in jection  instructions  are  inaeaiately  differentiated 
froa  regular  instructions  by  the  first  character  of  the 
instruction  aneaonic.  The  first  character  of  every  fault- 
injection  instruction  is  an  asterisk  (*]  coapared  to  a letter 
cf  the  alphabet  for  regular  instructions. 

The  fault-injection  instructions  which  are  siaiiar  to  the 
regular  instructions  are  characterized  by  identical  instruction 
raneaonics  with  the  addition  of  the  asterisk  as  the  first 
character  (e. g.  *AT) . The  difference  between  these  fault- 
injection  instructions  and  their  "regular"  counterparts  lies  in 
the  aaount  cf  tine  required  to  perforn  the  operations.  As  is 
well  known,  the  decoding  of  an  instruction  and  the  execution  cf 
the  operation  specified  require  a certain  anount  of  clock  tine 
within  the  computer.  This  tine  span  is  sinulaced  for  the 
regular  instructions.  However,  this  tine  span  is  not  sinuiated 
for  the  fault-injection  instructions  which  are  effectively 
decoded  and  executed  in  zero  tine.  This  allows  the  user  to 
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effectively  "stop  the  clock"  and 


n-r  3 


r any  of  the  values  in  the 


units  as  desired  by  using  the  necessary  fault-injection 
instruct  ions.  For  instance,  the  user  nay  specify 
Alul.arl ,0^"  which  will  effectively  force  hit  29  of  Alul.arl  to 
the  logical  value  1,  i.e.  a stuck-at-1.  Any  of  the  "regular" 
instruction  operations  say  thus  be  performed  as  f ault- in ject ion 
instructions  in  the  above  manner  with  no  tine  added  to  the 


clock.  Hence,  these  instructions  nay  be  used  to  inject 


certain 


faults. 


The  above  feature  allows  the  user  to  inject  transient 
faults  of  a short  duration.  It  should  be  noted  that  these 
injected  faults  will  be  sustained  unt^l  the  field  is  accessed 
by  another  operation  which  changes  its  value. 


To.  facilitate  the  simulation  of  permanent  faults,  the 
special  fault-injection  instructions  "SAO"  (stuck-at-Q)  and 
"SAI"  (stuck-at-1)  were  developed.  For  example,  to  permanently 
fix  bit  29  of  Alul.arl  to  the  logical  value  1,  the  instruction 
"SAI  Alul.arl, 29"  may  be  used.  3ith  every  reference  of  the 
field,  bit  29  will  be  fixed  at  the  logical  value  1.  The 


instructions  allow  any  number  of  bits  to 


specified  to  be 


stuck-at  a logical  value,  for  example,  "SAO  Alu  1 . ar 1 , 3 , 1 5 , 22" 
fixes  bits  3,  15,  and  22  to  the  logical  value  0.  Later,  "SiO 
Alul.arl, 25"  would  fix  bit  25  to  logical  0 in  addition  to  the 
previous  hits  stuck-at  logical  0. 


than 


In  order  to  s inula te  transient  faults  of  a longer  duration 
that  provided  by  the  asseably- type  fault- inject  ion 
instructions,  additional  special  fault-injection  instructions 
were  developed  vhich  allov  the  user  to  "unstick"  previously 
stuck-at  bits.  These  vere  the  "reaove  stuck-at-Q"  (*EF0)  and 
the  "rsaove  shuck-at-l"  (*HF1)  instructions.  As  an  exaaple, 
the  instruction  "FFO  Alul.arl, 15"  vould  "unstick"  bit  *5  vhich 
was  s tuck—  a t— 0 in  the  previous  paragraph,  i.e.  bit  15  vould  no 
longer  be  stuck- at -G  but  vould  hold  the  actual  logical  value 
assigned  by  an  operation. 

To  enhance  the  stuck- at  instructions,  the  "randon  stuck- 
at"  f ault-in jecticn  instructions  vere  devised.  These  allov 
stuck- at  faults  to  be  injected  at  a certain  frequency  specified 
in  the  instruction.  For  exaaple,  "F.SAO  Alul.arl,  13,19;1/77" 
states  that  "one  of  every  77  tines  that  Alul.arl  is  referenced, 
fix  bits  13  and  19  to  the  logical  value  0".  A randoa  nun ber 
generator  is  used  to  return  a nuaber  vhich  is  checked  against 
the  fraction  specified.  If  the  nuaber  returned  is  less  than  or 
equal  to  the  fraction,  the  bits  are  fixed  to  the  logical  value 
specified.  Othervise,  no  fault-injection  is  perforaed.  Thus, 
bits  aay  be  randcaly  stuck-at  logical  values. 

With  extensive  use  of  these  instructions,  the 
"bookkeeping"  required  of  the  user  in  order  to  know  the  current 
stuck-at  values  of  each  field  could  becoae  very  cuabersoae.  To 
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mitigate  this  situation,  the  "resov®  all  faults”  (?af) 
instruction  was  thus  de7ised  to  clear  all  faults  from  a field 
ana  allow  the  user  to  start  the  injection  of  faults  anew. 

A final  special  fault-injection  instruction  unrelated  to 

those  abo7e  is  the  *D2  (dead-end)  instruction.  This  allows  the 

nser  to  increase  or  decrease  the  execution  operation  tines  of 

instructions.  For  example,,  the  sequence 

ADI  CpuT.crl ,52 
*D  E Cpul.cr' , 229S 

states  "add  2293  time  units  to  the  original  execution  tine”  for 
the  field  Cpul.crl.  Effectively  then,  the  addition  operation 
will  require  the  nornal  addition  operation  tine  plus  an 
additional  2298  tine  units.  This  feature  allows  operation 
tines  to  be  easily  altered  dynanically,  thus  accounting  for 
possible  degradation  due  to  systen  faults.  (Operation  tines 
nay  be  also  changed  by  changing  specifications  in  the  D IT 152 
file — see  Section  3-5).  _ ..... 

As  was  mentioned  in  Section  3.3.1,  the  user  at  this  time 
is  pro7ided  with  the  capabilities  of  injecting  faults  but  has 
no  means  of  specifying  when  those  faults  are  to  be  injected 
without  st  re  nous  computation.  To  aid  the  user  in  this  respect, 
a separate  file  may  be  specified  which  contains  f a ult-  in  jection 
instructions  of  a special  format  which  support  a timing 
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feature.  The  following  exaaple  illustrates  this  feature: 

525  *SA0  Aiul.ar 1 ,20  , 14,  2,13 

874  *?.  S A 1 Alu2.ar2  , 17,  31,22 

259  *0  El  Cnul.crl,  15 

3586  *?.  A?  Ainl.arl 

The  nnnbers  on  the  left  of  the  fault-injection  instructions 
indicate  the  tines  the  particular  faults  are  to  be  injected. 
The  user  nay  thus  calculate  fault  tines  over  a gi7en  ni'ssion 
tine  using  a statistical  distribution  and  insert  the  faults 
according  to  this  distribution.  It  is  very  inportant  to  note 
however  that  this  is  the  only  file  where  tines  nay  be  assigned, 
and  that  no  labels  or  regular  instructions  should  appear  in 
this  file.  The  progran  will  not  be  executed  if  the  fornat 
specified  is  not  adhered  to. 


2.3.3  Prccran 


The  progran  in  ?has^-TI  .is  7ery  auch  like  an  assenbler 

with  added  features  to  acconnodate  the  unique  facilities  of  the 

sinulator.  It  consists  of  four  basic  parts:  {1) an  input 

• • • 

section  (2)  a section  to  decode  and  queue  the  f ault-  in  ject  ion 
tine  initialisations  (3) a section  analogous  to  pass-cne  of  an 
assenbler  and  { 4) a section  analogous  to  pass-two  of  an 


assenn  ie  r. 


J 


The  first  section  cf  the  assenbler  reads  input  indicating 
the  senary  where  the  progran  will  be  loaded.  1 check  is  nade 
cf  its  validity  and  the  beginning  address  of  the  specified 
nenory  calculated.  The  nane  of  the  file  containing  the  Phase- 
II  progran  is  then  also  read  and  checked. 

The  second  section  initially  checks  for  the  presence  of 
the  special  fault  initialization  file.  If  not  present,  the 
first  section  is  ccnplete  as  no  processing  is  required.  If  it 
is  present,  the  tines  specified  are  read,  the  instructions 
decoded,  and  the  specified  events  placed  on  the  event  queue  at 
the  designated  tines.  Further  discussion  of  the  placenent  on 
the  event  queue  is  found  in  Section  3.4. 


The  bulk  of  the  work  of  Phase-II  is  accoaplished  in  the 
third  section.  In  this  section,  the  instructions  and  pseudo 
operations  are  decoded,  the  labels  and  their  corresponding 
addresses  are  stored  in  a synbol  table,  and  all  of  the  binary 
code  for  all  of  the  instructions  except  branch  instructions  and 
nenory  reference  instructions  (which  require  labels)  is 
generated.  The  addresses  of  the  nenory  words  corresponding  to 
the  branch  and  load  and  store  fron  nenory  instructions  are  also 
stored  in  a table  for  use  in  the  third  section.  Thus,  all 
binary  code  except  for  the  instructions  with  labels  as  operands 
is  conpleted  in  this  section. 
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The  fourth,  section  completes  the  encoding  b y inserting  the 
label  address  for  the  instructions  with  labels  as  operands. 
Thus,  at  the  end  of  this  section,  the  binary  code  is  loaded 
into  neraory  and  ready  for  execution  in  Phase- III  of  the 
sinula tor. 

Throughout  Phase-II,  appropriate  syntax  checks  of  the 
user's  prcgran  are  nade.  in  the  event  of  an  error,  an  error 
nessage  is  produced  and  an  error  count  is  increaen ted.  If  the 
error  count  is  greater  than  rero  at  the  end  of  Phase-II,  no 
execution  is  attempted  and  the  user  is  urged  to  correct  the 
errors  indicated  and  run  the  prograa  again. 


2.  b PRIST  III:  Execution 

The  purpose  of  Phase-Ill  of  the  simulator  is  to  peri  cm 
the  actual  execution  cf  the  instructions  froa  phase- II  of  the 
simulator  on  the  systea  created  in  Phase-I.  This  requires  that 
the  binary  code  which  was  loaded  in  nenory  be  deciphered  and 
the  proper  actions  taken. 
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3.4.1  Descrinticn 


Of  fundamental  importance  in  the  simulation  is  tbs  proper 
interpretation  of  the  instructions  and  the  correct  scheduling 
of  the  specified  operations.  This  is  accomplished  by 
(1)  creating  and  maintaining  an  event  queue  and  (2) maintaining  a 
record  of  the  last  completion  tine  associated  with  a field. 
Both  of  these  actions  are  essential  parts  of  the  simulation. 

An  integral  part  of  this  scheme  is  the  presence  of  a 
universal  clock.  The  decoding,  scheduling,  and  performance  of 
ail  operations  inherently  depend  on  the  clock. 

To  illustrate  the  need  and  use  of  the  above-  and  also  to 
illustrate  how  Phase-Ill  operates,  a sample  program  segment  is 
given  in  Figure  3.4.1.  The  code  in  the  example  calculates  the 
quantity  ((25*35)  + 2**20)  and  places  the  result  in  register  1 
of  Cpul.  The  scheduling  and  execution  of  this  program  segment 
are  presented  in  Figure  3.4.3  using  the  operation  times 
specified  in  Figure  3.4.2.  . ■ • 


The  execution  of  the  load  instructions  is  very 
straightforward.  The  execution  of  the  multiplication  requires 
operands  found  in  two  registers.  However,  at  the  time  the 
instruction  is  decoded,  one  of  these  registers  (Cpu1.cr2)  is 


being  altered  by  a previous  instruction. 


:he  status  of  the 


registers  must  be  "remembered"  by  the  simulator  and  is  done  so 
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by  aeans  of  keeping  a "last  coapletion  tine"  record  for  each 
field  in  the  sinulator.  Thus,  the  nuitiplicatio n nay  begin  at 
the  tiae  both  fields  have  cospleted  previous  operations,  i.e. 
clock  7alue  10.  Since  a nultiplication  requires  19  tine  units, 
the  new  coapletion  tiae  for  the  fields  is  now  2S-  The 
execution  of  the  addition  instruction  and  the  shift  instruction 
are  handled  in  a siailar  aanner.  The  last  coapletion  record  is 
also  needed  for  the  proper  scheduling  of  the  f auit- injection 
instructions  as  is  evidenced  by  instruction  (5).  The  stuck- 
at-0  fault  is  to  be  injected  after  the  logical  shift  has  been 
executed,  thus,  the  fault  should  be  injected  at  the  tiae  the 
logical  shift  is  ccapleted,  i.e.,  clock  value  15. 

The  need  for  an  event  queue  to  schedule  events  properly 
and  a coapletion  tiae  record  to  prevent  overlapping  operations 
and  fault  scheduling  should  be  evident.  The  iaportance  of  the 
clock  is  also  illustrated. 

Another  integral  part  of  the . sinulator  is  the  injection  of 
the  stuck-at  faults.  Each  field  of  the  sinulator  has  a fault 
register  associated  with  it  (transparent  to  the  user)  which 
indicates  which  (if  any)  stuck-at  faults  are  to  be  injected. 
To  facilitate  the  injection  of  these  faults,  a stuck-at-1  nask, 
a stuck-at-0  nask,  a randon-stuck-a t- 1 nask,  and  a randon- 
stuck-at-0  nask  are  also  associated  with  each  field. 
Additionally,  the  randon  stuck-at  casks  require  an  associated 
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frequency  field 


faults,  a 


log  leal 


r- 


frequency  field.  To  inject  the  stuc!<-at-0 
AND  of  the  field  7alue  and  the  nask  is  performed  vith  the 
"faulted"  field  value  as  a result.  To  inject  the  stucX-at-1 
faults,  a logical  CS  is  perforoed. 


Ins  traction 

’lumber  Instruction 

(1)  LI  Cpu1,cr1,25  (load  reg-  with  value  25) 

(2)  LI  Cpu1.cr2,35  (load  reg.  with  value  35) 

(3)  LI  Cpu1.cr3,  1 (load  reg.  with  value  1) 

(4)  B Cpu  1.  crl ,Cpu  1.  cr2  (multiply  registers) 

(5)  SLL  Cpu1.cr3,20  (shift  left  logical  20  bits) 

(6)  *SA0  Cpu1.cr3,  13,19  (s-a-0  bits  13  and  19) 

C7)  AD  Cpu l.crl  ,Cpa  1 .cr3  (add  registers) 


Figure  3.4.1:  Saaple  program  segment. 


Operat ion 


Operation 

time 


Fetch  and  Decode 
Logical  shift 
Load 

Addition 
Bultiplica  tion 


2 time  units 

3 time  units 
6 time  units 

10  time  units 
19  time  units 


Figure  3.4.2:  Sample  operation  times  (arbitrary) 


Clock 

value 


Operations  perfected 


0 

2 

4 

6 

3 

10 


12 


15 

29 

39 


Instruction  (1)  decoded,  placed  on 
the  event  queue,  completion  tine  = 8 

Instruction  (2)  decoded,  placed  on 
the  event  queue,  ccapletion  tine  = 10 

Instruction  (3)  decoded,  placed  on 
the  event  queue,  ccapletion  tine  = 12 


Instruction  (1)  execution  conpleted 
Instruction  (4)  decoded,  placed  on 
the  event  queue,  ccapletion  tine  = 29 


Instruction  (2)  execution  conpleted 
Instruction  (5)  decoded,  placed  on 
the  event  queue,  coapletion  tine  = 15 


Instruction  (3)  execution  conpleted 
Instruction  (6)  decoded,  placed  on 
the  event  queue,  ccapletion  tiae  = 15 
Instruction  (7)  decoded,  placed  on 
the  event  queue,  coapletion  tine  = 39 

Instruction  (5)  execution  conpleted 
Instruction  (6)  execution  conpleted 


Instruction 

Instruction 


(4) 

execution 

conpleted 

(7) 

-•  * «M 

execu  tion 

coapleted 

Figure  3-4.3:  Scheduling  and  execution  of  prograa 

segaent  in  Figure  3-4.1  using 
operation  tiaes  in  Figure  3-4.2. 
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3-U.2  Program 


The  following  sequence  suasarizes  the  operations  performed 
in  Phase-III  rf  the  siaulator: 

Initialize  clock 

LOO?  antil  all  instructions  are  processed: 

Execute  instructions  with  coaplaticn 
tiaes  <=  clock 

Decode  and  queue  instruction 
Increaent  clock  by  DIC0DITI23  {if 


not  fault-injection) 


End  LOOP. 


The  clock  is  initialized  to  zero,  and  all  instructions  to  be 

executed  at  that  tiae  (any  fault  initializations  specified  at 

tine  0)  are  performed.  The  first  instruction  is  then  decoded1 

and  placed  on  the  event  queue,  and  the  clock  is  incranented  if 

the  instruction  va s a regular  instruction.  The  execute,  decode 

and  queue,  and  increment  sequence  is  then  repeated  until  all 
*••••*• 

instructions  are  processed. 

The  event  queue  is  represented  by  a doubly  linked  list, 
with  each  ncde  representing  an  operation  specified  by  an 
instruction  in  Phase-II.  The  entries  on  the  list  are  sorted  in 
ascending  order  according  to  the  event  completion  tine.  Thus, 
the  execution  routine  removes  events  fron  the  queue  and 
perforas  the  specified  operation  until  the  event  ccnpletion 
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tins  of  a node  on  the  ensue  exceeds  the  clock  value.  The  next 
instruction  is  then  decoded  and  placed  on  the  queue  and  the 
clock  value  incremented  if  necessary. 


3.5  Osage 


The  user  is  expected  to  supply  a program  for  both  Phase-I 
and  ?hase-II  of  the  simulator.  These  programs  should  conform 
to  the  syntax  indicated  in  Section  U.  1 for  Phase-I  (creating 
the  system)  and  Section  U.2  (programming  the  system). 

When  the  object  file  for  the  siaulator  is  execrated,  the 
user  will  be  requested  to  supply  the  following  information: 

(1)  the  name  of  the  file  containing  the 
program  for  Phase-I, 

(2)  whether  the  fault  initialisation  file 
for  Phase-II  is  present  and  the 

name  of  the  file  if  it  is  present 

(3)  the  name  of  a memory  created  in 
Phase-I  where  the  program  from 
Phase-II  will  be  loaded,  and 

(4)  the  name  of  the  file  containing  the 
program  for  Phase-II 

The  desired  system  will  then  be  simulated. 


i 


To  support  the  simulation  of  as  aanv  nachines  as  possible, 
a DEFINE  file  is  provided  which  allows  operation  tines  to  be 
changed  at  the  user’s  will.  Operations  are  listed  and  their 
corresponding  operation  tines  listed  next  to  then-  Tor 
exanple,  "define  AEDTIKZ  8"  states  that  the  operation  tine  for 
an  addition  operation  is  eight  cloch  units.  To  change  this, 
the  user  need  only  change  the  "9"  to  the  desired  value-  The 
define  file  is  included  within  the  sirulator  and  the  operation 
tines  given  in  the  DEFOE  file  are  the  ones  used  in  the 
simulation. 


3-6  Siaulator  Flowcharts/Outlines 

In  order  to  aid  the  understanding  of  the  program,  the 
following  flowch arts/cutlines  are  given.  The  following 
conventions  are  used: 

(1)  All  subroutine  nanes  are  capitalised  in 
order  to  distinguish  then  froa  other 
information  (e-g.  CCDEIhST). 

(2)  The  increment  convention  of  the  language 
'C*  is  used  (e-g.  "progaddr  = +■  2" 
indicates  that  the  variable  progaddr 

is  incremented  by  2) . 

(3)  If  the  purpose  of  the  subroutine  is 
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p 


obvious,  a flowchart/outline  is  not 


included  for  th at  routine  (e.  g.  GZTLI'IZ)  . 


Instead  of  using  strictly  a flowchart  fcraat,  aany  of  the 


processing  boxes  include  brief  outlines  as  it  was  felt  that 


this  would  aid  the  reader's  understanding- 


TVO  0? 
T 


?ASSr-'OR3 


set  bits  1-3  ar.d  16-19 
get  operand  l(fleld),  set 
bits  4-15  to  fie Id  10,  g»e 
operand  2 (field  or  label) 
sec  bits  20-31 
progaddr  — t-  2 


faulrbit  » bis  C 
oytype  - bits  1 
cpndl  » bits  4 - 
cpspe<<  « bits  16 
opnd2  » bits  20 


^ I X n 
(indirect'/ 


set  indirect 
bit 


Call  appropriate  rou- 
tine according  to  in- 
struction type: 

TVOQ  SSAltC-Q 

nniq  othz?.q 

cc<q  rAO-TQ 


if  fault  i 
QIVTXTC 


if  fault  instruction: 

QrvrrrCfield  corolecicn  tine, 
operation.addressas  of  operands) 
otherwise : 

calculate  nev  corn: lesion  tine 
OEVtirrCeoraletion  tine,  oceration 
address  of  operand) 
reset  tine  field  to  r.ev 


on  tine , 

rsss  o:  ooerand) 


QEYEKT  according  to  ccmnletion  1 

tine,  operation,  field  address  i 

and  operands  following  CPs 

below: 

*RAF:  cone 

*DE:  ooadl  (value  to  increase- 

tine  by) 

*SAO  , *S  Al , *?.70 , *R~1 : 

! 

(appropriate  mask) 

*RSAO  ,*?iAl 

(appropriate  nask,  frequency) 

completion  tine  ■ nax(co=pletion  cine 
fields  in  the  uaic) 
if  fault  instruction; 

QEVUITC  completion  time,  operation, 
of  first  field  in  unit,  memory  li: 
cenory  instruction) 
reset  field  escalation  tines  in  uni 
ochervise : 

calculate  new  completion  tine 
<JEVEHT(cocpleticn  tins,  operation, 
of  first  field  in  unit,  nenor 7 lin 
aenory  instruction) 

ft.i;  Jari.-,-,  -•'-or  f 


address 
ni!3  fs 


address 
its  for 


if  fault  Instruction: 

QEYE:r:(field  co repletion  cine, 
operation,  address  of  opndi, 

Value  of  opndi 

otherwise: 

calculate  nev  completion  tine 
QEVErTCcocpleeion  tins,  operation, 
address  of  opndi,  value  of  opndi) 
reset  tine  field  to  nev  conoletion 
tine 


XE7EJ7S 


virile  node  tire  < clock  do: 
REMOVE  (node) 

Call  appropriate  routine 
according  to  operation 
on  node 

TWO!  3 RAM  CHS 

emits  ctherx 

ONES  7ACLTX 

IMMX 

' Mote:  the  above  routines  pe: 
fors  operations  on  operands 
as  specified  in  the  qua  in  3 
routines 


<i.  DZSCHIPTIC'A  0?  LA.  17  G 0 A G I 


This  chapter  presents  a syntactic  description  of  the 
languages  used  in  ?hase-l  and  ?hase-II  of  the  sinulator.  The 
language  will  be  presented  using  a foraat  siailar  to  the 
coaaonly-used  Backus  Moraal  Fern.  Sequences  of  characters 
enclosed  in  n ()  " represent  entities  whose  values  are  strings  of 
symbols.  An  example  of  a syntactic  rule  is  the  following: 

(ab)  :=  c I (ab) 

where  the  nark  means  "is  defined  as"  and  the  nark  "l" 

neans  n0H".  Entities  enclosed  in  •"[  ]"  indicate  that  the 
entities  are  optional. 


The  following  syntactic  rules  apply  to 
Phase- 1 and  Phase- II. 


prog  rar.s 


:otn 


(digit)  :=  0 
6 

(letter)  :=  a 


1 


1 

7 


2 

i 


I 


a 

I b ] c 

1 d 

1 e ! 

g 

I h | i 

1 3 

1 k | 

7_ 

a 

I n | o 

I P 

I g I 

r 

s 

! t | u 

1 7 

1 V 1 

X 

y 

1 2 ! A 

1 B 

1 C | 

D 

■p 

—* 

| F 1 G 

1 B 

1 I I 

J 

K 

| L I fl 

I N 

1 c I 

p 

Q 

1 B I S 

] T 

1 o 1 

V 

'S 

| X | Y 

1 2 

J 

= (letter) 

1 (identifier) 

(i 

(ident 

if  ier) 

(digit) 

c- 

3 (digit)  | 

(nuaber)  (digit 

) 

( 

nunber)  j 

(nunbe 

r)  ( , nunbe 

rs 

ruula  which  is 

not  a 

variable 

0 

ter) 


c onnec t a ve 


denotes  itself. 
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u.1  Phase-!:  Definition  Phase 


Syntactic  Description: 


(AID  field)  :=  rl  | r2  | n j d 1 s 

{bns  field)  :=  s 1 d [ a 

(C?0  field)  :=  2 I s 1 rl  | r2  | r3  i 

rh  i r5  | r5  | r7  i r3 

(aeaory  field)  :=  a I s | d 

(7SD  field)  :=  a | dl  ! d2  | d3  I d5  | 

d6  ! d7  | d8  | s | d 

(peripheral  device)  :=  n | dl  | s J d 

(field  identifier  : = (identifier) . a (ALD  field)  | 

(ident ifier) . b (bus  field)  l 

(identifier)  . c (CFO  field)  | 

(identifier) .a (neacry  field)  | 
(identifier)  . v (7SD  field)  | 
(identifier) .p (peripheral  field) 

(field  identifier  list)  :=  (field  identifier)  1 

( , field  identifier  list) 

(ccaaand  line)  :=  (field  identifier)  > (field 

identifier  list) 

(program)  : = (conaand  line)  | 

(prog  ran)  (conaand  line) 

Additional  rule:  a (field  identifier)  nay  appea 

only  once  on  the  left  of  the  ” 
cf  the  program.  Jotnre  refers 
of  that  (field  identifier)  cn 
left  of  the  v ill  he  ignore 
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The  definitions  of  "identifier"  and  "field  identifier"  in 
this  section  vill  he  as  defined  in  Section  1 . 


A statement,  S , nay  he  labeled  as  follows: 

{identifier)  : S 

Syntactic  Description  of  Assembler  Tile: 

(unit  identifier)  :=  (identifier) 

(label)  :=  (identifier) 

(statement)  :=  (o?1)  (field  identifier  1)  , 

(field  identifier  2)[,I]  | 

(op2)  (field  identifier  1),  (label)  | 

(op3)  (unit  identifier)  ] 

(opt)  (unit  identifier)  , (number)  , (number) 

(o?5)  (field  identifier)  1 

(op6)  (field  identif  ier)  , (numb  er)  I 

(op7)  (label)  [,1]  | 

(op  S)  (number)  , (label)  [ ] 


r 


(o?9) 

I 

(opl  0) 

(unit  i 

.dentif  ier)  , ([I  ][C  ])  | 

(opl  T) 

(field 

identifier)  | 

(op12) 

(field 

identifier)',  (number) 

{opl  3) 

(field 

identifier)  , (numbers) 

(opl  4) 

(field 

identifier)  , (numbers)  ; 

(number)  / ( number) 


(note:  operands  for  (opl)  and  (od14) 

should  be  on  the  sane  line) 


(opl) 

AD  | 

S3 

1 2 1 D I 

AH  1 

CH  | 

X2  | 

a 7 

| C 1 *AD 

1 * S3 

1 *2  1 

*D  | 

*AN  | *03  I *XH  1 *b7  | *C 

(op2) 

:=  ID  it  ! 

STS  ] *ST3  | 

*ld  a 

(op3) 

:=  ADU 

1 

S30  1 «?0  ] 

DO  1 

AND  | 

OHU  • 

1 

X?0  | CLU  | 

UDUHP 

1 70T3 

SST 

I 

?.ST 

* AD  0 

1 

*sao  i *apo 

1 *D0 

! * A NO 

*oao 

1 

*230  | *CL0 

1 *odos?  | 

*70  TZ 

| *S5T  | *HST 

(oph) 

:=  MDOK? 

| * HOUSE 

(o?5) 

:=  NZG  1 

HOT  I SQP.T  | 

A 3 S | 

CL  I 

PAH  | *HEG  | 

HOT  | 

*SQHT  | 

*A3S  | 

*CL  | '*PAF 

(op6) 

:=  ADI 

1 

sbi  i ■ ai  I 

DI  ! 

A HI  | 

CPI 

i 

XHI  I LI  | 

Cl  | 

SLL  | 

SP.L 

! 

SLA  I S3A  | 

* AD  I 

1 

*S3i  i *ai 

| * DI 

| * A H I 

*C5I 

1 

*X3I  I * LI 

1 *ci 

! * SLL 

*SHL 

! 

*SLA  J * S3  A 

(op7) 

3 

I 

ESA  1 *3  I 

* IS  A 

(opS) 

:=  3C  ( 

* EC 

(cp9) 

. — ST" 

I 

HIT  I *HLT  | 

at  p pT 

1 

3C3  3 

1 

*30  a 3 

(cp  10) 

o 

l-l 

II 
• • 

*:o 
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(opll)  :=  *3A  r 

(op  12)  :=  *DS 

(op  1 3)  :=  *SA0,  *SA1,  *5?0,  *351 

(opU)  :=  *3S?.0,  *3SA1 

Camaents : 

Coaaents  nay  follow  the  operands  of  instructions  provided 
the  operands  are  followed  by  one  or  sore  flanks  and  a 
seaicolon.  Hand ca-stuck-a t instructions  are  the  only 
exceptions  and  nay  not  be  followed  by  ccnaents. 


Pseudos: 

Pseudo-operations  are  of  the  following  fern: 

3UZF  nu  aber  (,nuabers) 

where  nuabers  are  as  specified  above 

* 

SDEFO  nuaber  (, nuabers) 

where  nuabers  are  as  specified  above 
except  that  the  digits  oust  be  integers 
froa  0 (aero)  to  7 (seven)  . The 
octal  7alue  is  used  here. 


A statement  is  as  defined  above  vi'th  the  addition  of  a 
aandatory  number  (the  tiae  a statement  is  to  be  executed) 
preceding  the  inst  ruction  anemonic.  Also,  the  firs*  character 
cf  each,  instruction  anesonic  aust  begin  with  i.e.  , aus:  be 

a fault-injection  instruction.  We  have  the  following 
definition : 

(timing  stateaent)  : - (nuaber)  (stateaent) 
where  (stateaent)  is  defined  as  indicated  in  the  assenbler  file 
(with  mnemonics  required  to  begin  with  "*"}  , and  the  (nuaber) 
specifying  the  tine  the  instruction  is  to  be  executed.  The 
(nuaber)  nay  not  be  preceded  by  a nirus  sign  (-)  in  this 
statement,  i.e.  , all  numbers  aust  be  non-negative. 


1 

[ 

[ 


h.2.2  language  Xnsmonics  and  their  Description 

In  the  following  discussion,  the  notation  indicates 
"becomes” , for  example,  "'JUKI  <-  WOa  1 + NDh2n  states  that  the 
value  of  UUH  1 is  changed  to  the  sua  of  JJUhl  and  h'Uh2.  The 
notation,  "c  (’lAftT)  " refers  to  the  contents  of  the  field  or 
address  with  representation  ITihl,  the  notation  indicates  a 
comparison.  operation,  the  notation  "<<"  indicates  a left  shift, 
and  the  notation  ">>"  indicates  a right  shift.  "C?"  represents 
the  instruction  mnemonics  for  a particular  instruction  type. 


8-59 


Tv o ocera r.d  instructions: 

Type  I:  0?  (field  identifier  1)  , (field  nane  2) 


let  v 1 
v2 


0?  (nnesonic) 


c (field  identifier  1) 
c(field  identifier  2)  , if  no  I is  present 
c(c  (field  identifier  2)}  , ‘if  I is  present 

Operation 


C*]A  D 

vl 

<- 

vl 

+ v2 

[*]S3 

vl 

<- 

vl 

- v2 

C*]a 

vl 

<- 

vl 

* v2 

C*]D 

vl 

<- 

V 1 

/ v2 

vl 

<- 

V 1 

5 v2 

[*]02 

vl 

<- 

Vl 

I v2 

(inclusive  or) 

t * ]XS 

vl 

<- 

Vl 

-»  v2 

(exclusive  or) 

[*]MV 

vl 

<- 

v2 

[*]C 

vl 

7 

72, 

set 

condition  code 

Type  II:  C? 

Let  vl  = 
v2  = 


register 

(field  identifier) , (label)  [ ,1 ] 

c (field  identifier) 
c(label)  if  no  I is  present 
c{  c (label)  ) if  I is  present 


OP  (nneaonic) 

[ * ]LDK 
[ *]STH 


Operation 

v2  <-  vl 
vl  <-  v2 


B-60 


Unit  0 se ~a nd  Instuctlons: 

Type  I:  0?  (unit  identifier) 


Note:  the  unit  field  acronyms  are  used  in  the 

table  below  to  indicate  the  operands 
actually  used  by  the  operations 
(see  Section  3-2-1). 


0?  (aneaonic)  Unit  type 


Operat ion 


[ * ]ADU 

ALU 

d <- 

rl  + r2 

£ * ]S3U 

ALU 

d <- 

r1  - r2 

[ * ]«?0 

ALU 

d <- 

r 1 * r2 

£ * ]D0 

ALU 

d <- 

rl  / r2 

[*]A2JU 

ALU 

d <- 

rl  & r2 

[ * ]03U 

ALU 

d <- 

rl  J r 2 (inc 

£ *]7.RU 

ALU 

d <- 

rl  - r2  (exc 

[ * ]ciu 

any 

clears  all  fields  ■ 

[ * JODOHP 

any 

prints  all  fields  • 

[ * ]fOTE 

VSD 

d <- 

word  represen 
aajority  of  r 

£ *]SST 

any 

s <- 

1 

£ * ]HST 

any 

s <- 

0 

II:  £*323  03? 

(unit 

identif i 

er)  (nunberl)  , ( 

-prints 

aenory 

words  fr 

ca  location  (n 

to  location  (nuaber  2) 

Note  : 

(nuaber 

1)  < (a 

usher  2) 

of  unit 


Sine  le  oesra  nd  ins  tructior.s: 

OP  (field  identifier) 

Let  vl  = c (field  identifier) 

CP  (ra nenoaic)  Operation 


[*3NIG 
[ * jlTOT 
[ *]SQET 
[*]A3S 
[ * ]CL 
[ * ]P A? 


7l  <-  twos  conplenent  of  7l 
7 1 <-  ones  cosplsaent  of  vl 
7 1 <-  square  root  of  7 1 
7 1 <-  absolute  value  of  v 1 

7 1 <-  0 

parity  of  71  checked,  condit. 
code  reaister  set 


: naedia  t = Operand  Instructions  : 

OP  (field  identifier)  , (nunber) 

Xet  v 1 = c (field  identifier) 
v2  = (nuaber) 


OP  (aneaonic) 


Operation 


[*]ADI 

v 1 

<- 

V 1 

+ v2 

[ * ]S3I 

vl 

<- 

V 1 

- 7 2 

C*]KI 

vl 

<- 

V 1 

* v2 

C*]DI 

V 1 

<- 

vl 

/ v2 

C*JAHI 

7-1 

<- 

V 1 

5 v2 

C*]03I 

Vi 

<- 

vl 

1 v2 

(inclusive  or 

£ *]X?.I 

vl 

<- 

V 1 

->  v2 

(exclusive  or) 

[*]LI 

vl 

<- 

v2 

I*]CI 

vl 

v2, 

set  condition  code 

register 

[ * ]SLL 

Yl 

<- 

vl 

<<  v2 

(logical) 

£ *]S31 

V 1 

<- 

V 1 

>>  v2 

(logical) 

[ * ]S  LA 

V 1 

<- 

V 1 

<<  v2 

(arithaetic) 

[ 5)1  ]SHA 

V 1 

<- 

V 1 

» v2 

(arithmetic) 

Gthe r In  struct  ions  : 

[ * ]IG  (unit  identifier)  ,([  1,0  ]) 

Operation:  If  I:  Input  into  unit 

If  0:  Output  froa  unit 

£ *]3GK3 

Operation:  program  immediately  stops, 

instructions  remaining  on  the 
event  queue  are  not  executed. 
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OP  (mnemonic)  Operation 

[ * ]B  program  counter  <-  al 

[ * ]BSA  program  counter  placed  on 

stack,  program  counter  <-  a"! 

Type  II:  BC  ( number)  , (lab  el)  [ , I ] 

Let  al  be  defined  as  above. 

number  = a condition  code  number 
The  condition  codes  are  indicated  as  follows 
(leftmost  bit  is  bit  0): 

Condition 


Condition 

Bit  number  set 

Code  Number 

equal 

31 

0 1 ! 

less  than 

30 

02 

greater  than 

29 

Oh 

parity  (odd) 

28 

08 

parity  (even) 

27 

016 

Operation: 

the  condition  code  number 

is  compared 

to  the  condition  code  register.  If  any  of  the 
logical  1 bits  of  the  values  compared  match, 
the  branch  is  performed  as  in  Type  I. 
Otherwise,  no  branch  is  performed. 

Type  ill:  0? 

0?  (mnemonic)  Operation 

[*]HLT  branch  to  and  of  program 

[ * ]2ZT  pop  address  from  stack, 

pc  <-  address  + 1 (next  word) 
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Soecia  1 ~ a g 1 f Injection  I r.  struct  ions: 


T7TI9  T' 


‘HA?  (field  identifier) 


Operation:  error  register  associated  with 

(field  identifier)  is  set  to  0 
s-a-1  aasks  set  to  logical  0's 
s-a-0  aaslcs  set  to  logical  1*s 

Type  II:  *DE  (field  ident if  ier)  , (nuab er) 

Operation:  adds  (nunber)  to  the  last  coopletio 

tiae  register  associated  with'  (fie! 
identifier) 

Type  III:  0?  (field  identifier)  , (naabers) 


OP  (anenonic) 


*SA0 


0 Deration 

bits  corresponding  to  (nutters) 
are  assigned  logical  0 in 
s-a-0  n ask  associated  with 
(field  identifier) 
error  register  ’’bit  31"  set  to  1 


*SA1 


bits  corresponding  to  (numbers) 
are  assigned  logical  G in  s-a-1 
aask  associated  with  (field 
identifier) 

error  register  ’’bit  30n  set  to  1 

bits  corresponding  to  (numbers) 
are  assigned  logical  1 in  s-a-0 
nask  associated  with  (field 
identifier) 

if  s-a-0  nasks  bits  are  all 
logical  1,  appropriate  bits  in 
error  register  are  set  to  0 

bits  corresponding  to  (numbers) 
are  assigned  logical  0 in  s-a-1 
nask  associated  with  (field 
identifier) 

if  s-a-1  masks  tits  are  all 
logical  0,  appropriate  bits  in 
error  register  are  set  to  0 


t!  11 


Spec  ia  I fault  I nie  of  ion  Instructions  (cent)  : 

Type  17:  C?  (field  ident if  ier)  , (nuabers)  ; 

(nuaberl)  /(nuaber2) 

(note:  instruction  should  be  on  a single  line) 

0? (anemonic)  Operation 

*RSA0  saae  operations  as  perforaed  in 

*SA0  except  "bit  29"  is  set 
to  1;  additionally,  frequen 
(number"!}  divided  by  (nuaber 
is  placed  in  frequency  field 
associated  with  field  identifier 

*?.S A 1 sane  operations  as  perforaed  in 

*SA1  except  "bit  23"  is  set 
to  1 ; additionally,  frequency  = 
(nuaberl)  divided  by  (nuaber2) 
is  placed  in  frequency  field 
associated  with  field  tine 


i 
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One  possible  usa  of  the  siaulator  is  illustrated  in  the 
following  exaaple  where  the  faalt-tolerance  of  a systen  is 
exanined  subject  to  the  injection  of  faults  at  different 
arrival  rates.  Interarrival  tines  of  the  faults  were 
exponentially  distributed  with  the  following  three  arrival 
rates: 


rate  1:  1/2000 

rate  2:  • 1/5000 

rate  3:  1/100  00 

The 

systen 

studied  is  iilustrat 

ed  in  Figure  5.1, 

♦ 

with  the 

Phase-I 

incut 

file  which  created 

the  systen  listed 

in  Figure 

5-  2.  To 

inject 

the  desired  faults. 

a progran  was  writ 

ten  which 

genera  ted 

the 

fa  ult- initialisation 

file  to  inject  t 

according  to  exponential  arrival  tines;  an  exaaple  is  shewn  in 
Figure  5-3-  The  Phase-Il  prograa  which  is  listed  in  Figure  5.- 
loops  twenty  tines,  perferning  a sinple  arithnetic  calculation 
in  each  of  the  six  arithnetic- logic-units , and  voting  twice  for 
verification.  With  no  faults  injected,  this  prograa  required 
17,067  clock  units  to  conpiete  using  operation  tines  listed  in 
Figure  5.5  (froa  the  D IF IN I file) - 

To  study  the  fault  tolerance  of  the  systen,  one  hundred 
runs  were  nade  with  each  arrival  rate  and  the  nunber  of 
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"failures"  and  "successes"  recorded 


A failure  was  defined  as 
"no  majority  value  in  the  vsd",  or  "improper  time  to  complete 
the  Phase-II  program",  i.e.,  a time  other  than  17,067  time 
units.  A successful  run  vas  defined  as  one  which  did  not  fail. 

The  results  of  the  computer  runs  are  listed  in  Figure  5.6. 
As  would  be  expected,  the  decreasing  arrival  rate  of  faults 
resulted  in  a greater  number  of  successful  completions  of  the 
program.  The  tolerance  of  the  system  according  to  these 
arrival  rates  is  thus  illustrated. 

It  should  be  noted  that  this  example  serves  to  illustrate 
only  one  use  of  the  simulator.  Many  other  uses  as  mentioned 
earlier  are  also  supported. 
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MAINcpu. cr  1 > aainraea.nd 

£AINcdu. cr2  > 

alu21.  arl , 
alu22.  arl , 
alu23.  arl , 
alu  1 1-  arl , 
alu  12. arl , 
alu  13.  arl 

ftAINcpu. cr 7 > 

alu21.ar2, 
alu22.  ar2 , 
aln  23. a r2 , 
ala  1 1.  ar2/ 
alu  12.  ar2, 
alu  13.  ar2 

ala21.ad  > vsd2.vd1 
alu22.  ad>v  sd2. vd2 
alu23.ad>  vsd2.yd3 

alul  1-ad>7sd1.  vdl 
alu12.ad>vad1.-7d2 
alul 3-ad>vsd1. vd3 
vsdl.vd  > VSDfail.vdl 
vsd2-vd  > VSDfail- vd2 

AnaDig.pdl  > MAIHcpu.cr2 


Figure  5.2:  Phase- I input  used  to  create  the  systen  in 

Figure  5-  1 . 
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926 

*R  SA1 

AcaBig.  pd  , 21 , 2 4, 5 ; 1/74 

5663 

* M CT 

MAINcpu.  cr3 

7043 

-3  A 1 

alul  2-31,26,15,3 

13263 

*X5Z 

MAINcpu. cr 2, 3533 

20592 

=*S  50 

alul  3 

27122 

*2  SAO 

alu22- ar2 , 5 ; i/93 

28016 

*S  A1 

main  sea,  ad , 29, 1 4, 1 S; 

29049 

*D  5 

MAINcpu. crl, 21141 

33173 

*S  AO 

alul 2. ar  1 , 0 

344  8 3 

- S AO 

AnaD ig-pd,31,24,5,22 

34903 

*S  AO 

MAIN cp  u. cr3 , 29 , 14 

36123 

*S  A1 

MAINcpu. cr8 , 21 

40453 

*S  AO 

alul  2-  ar  1 , 0 

4229  1 

*3  S AO 

alu22. arl ,31;  1/53 

4295  3 

*3SA0 

aairaea . nd , 23 , 16,29,14; 

U44?3 

*D0 

alu  12 

46537 

*S  AO 

alu23.ar2, 13,7, 0,13 

48222 

*CL 

aaicaea- aa 

33450 

*2  SA1 

AnaDig- pd, 1,2, 23;  1/36 

55310 

*L  I 

MAINcpu. cr2, 442 

56313 

*S  PA 

alul 3- ar 1 , 20 

57057 

*S  AO 

MAINcpu.  crS  ,3,12,9,10 

53285 

SA 1 

aainaera-ad , 1, 2 ,23 ; 1/50 

53635 

®S  A 1 

alu 2 2.  ar  2,  15 

59994 

*L  I 

rtnaDig- pd 1 , 3921 

Figure  5. 

3:  Saaple  fault-initialization  prog 

■with  exponential  interarrizal  ti 

B-70 


loop : 


boob : 


THIS  PAGE  IS  BEST  QUALITY  mCTlCABU 
JT3QM  COPY  EUKJilSHJiii  10  DUO  — 


LI 

vsdl.va,  3 

LI 

vsd2.73,  3 

LI 

YSDfail. 73 

V 

/ — 

LI 

AnaDig.  pa. 

1 

LI 

MAINcpul  cr 

■a,  20 

10 

I,  AnaDig 

MY 

MAIN cpu.cr 

2,  AnaDig. pdl 

10 

I,  AnaDig 

MY 

MAIN  cpu.cr  3, AnaDig. pdl 

SI 

MAINcpu. cr2 , 102  ' 

SI 

MAIN cp  u. cr3 , 50 0 

MY 

alul  1.  ar  1 , 

MAIN cpu . cr2 

MY 

alu12.  arl. 

MAIN  cru.cr2 

MY 

alul  3-  ar  1 , 

MAIN cpu.cr 2 

MY 

alu2  1.  ar  1 , 

MAIN cpu.cr! 

MY 

alu22-  ar  1 , 

MAIN cpu.cr 2 

MY 

alu23.  ar  1 , 

MAIN cnu. cr 2 

MY 

alul 1. ar2. 

MAIN  cpu . cr3 

MY 

alul  2.  ar2. 

MAIN cpu.cr 3 

KY 

alul  3.  ar 2, 

MAIN  cpu . cr 3 

MY 

alu2 1. ar2. 

MAIN  cpu . cr3 

MY 

alu22.  ar 2, 

MAINcpu. cr3 

MY 

alu23. ar2. 

MAIN cpu. cr3 

c?.  a 

alul  1 

03  D 

alul  2 

OHO 

alul  3 

o?.  a 

alu2  1 

03  0 

a 1 u 2 2 

020 

alu23 

MY 

vsdl. vd  1 , 

alul 1 .ad 

MY 

vsdl . vd2 , 

alu12.ad 

MY 

vsdl.vdl. 

alul 3. ad 

MY 

vsd2. vd  * , 

alu21 . ad 

MY 

7sd2.vd2 , 

alu22 - ad 

MY 

vsd2. vd3  , 

alu23 . ad 

7013 

7Sd1 

Y0I3 

7sd2 

M7 

YSDfail. vd 1,  vsdl . vd 

MY 

YSDfail. 7d2,  vsd2. vd 

YO 13 

YSDfail 

Cl 

YSDfail.  vd 

, 502 

3C 

6,  bonb 

S3  I 

MAIN  cp u. cr 

1 

Cl 

MAIN cpu.cr 

- , o 

BC 

6,  loop 

KIT 

BC  M3 

Figure  5.<i:  ?hase-II  input  used  to  program 


svste: 
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def ine 

ADD  TUI Z 

10 

d e f ine 

2 CL  TUi  2 

1 5 

define 

DI7TT2Z 

20 

define 

AHD7IM3 

■5 

■mJ 

define 

OHTIfIZ 

3 

define 

XOSTIME 

5 

define 

20  7 S7IJ1 Z 

3 

define 

CPSTI2Z 

25 

define 

CZCCDET IMF 

1 

define 

HEGTIHZ 

3 

define 

CPLTI2Z 

3 

define 

SQSTIH2 

45 

def ine 

ABSTI22 

1 2 

define 

CLHTI2Z 

5 

def ine 

PAH TIME 

6 

define 

L0AETIH2 

3 

def ine 

IS H TIME 

15 

define 

ASHTIK3 

20 

define 

3CHTI2Z 

3 

define 

T0TI2E 

350 

define 

DU2PTI2Z 

20 

define 

707ZTIHZ 

22 

define 

FETCETiaZ 

20 

Figure  5-5:  -Operation  tines  as  specified  in  the 
define  file  of  the  sxansle. 


arrival  rate  | nunber  of  successful  couplet. ions 
of  faults  | of  100  possibilities 


1/2000  l 75 


1/5000  | 39 


1/10000  | 93 


Figure  5.6:  :iu:bsr  cf  successful  couplet  ions  cf 
program  listed  in  Section  5.4  subject 
to  exponential  interarrival  tines  vitu 
arrival  rates  as  specified. 
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6.  CONCLUSION 

The  siaulator  described  in  this  thesis  provides  a neans  of 
studying  the  reliability  of  aany  systea  configurations  subject 
to  any  number  of  fault  distributions.  It  provides  a neats  of 
defining  a systea  by  using  a set  of  unit  types,  and  a aeans  of 
studying  the  reliability  of  the  systeas  defined  by  supporting 
the  injection  of  faults  and  providing  a aeans  of  prcgraaaing 
error-detection  and  error-recovery. 

The  siaulator  thus  combines  the  benefits  of  real  hardware 
organizations  and  analytical  aodels  by  supporting  the  ncdeling 
of  faults  in  a wide  range  of  systeas  and  facilitatinc 
performance  evaluation  based  on  the  processing  of  typical 
workloads  subject  to  these  faults. 
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include  "define" 
struct  bud  ( 

int  tildes; 
int  nleft; 
char  *nextp; 
char  but  t[ 512]; 

}: 

struct  typeO  ( 

char  anaae(9]; 

int  ac1(  FiDSIZZ],  ar2[  FLSSIZ  2 ],  aa[?LDSIS2], 

ad[ ? LDSIZ2  ],  as[ r IDS! Z2  J; 

}: 


struct  type1 
char 

int 


int 

}; 


anaae[ 9 ]; 

aa[ FtDSTZS], 
as[ F1DSTZ3], 
adfriDSISS]; 

pd phase; * 


struct  type’ 
char 

int 


J; 


cnaaef  9 ]; 

C=C FIDSZZ2], 
cs[  F2DSIZ2], 

cr'[?I03IZ3],  cr2[tt3SIS2],  cr2[  r.ssrzr],  cra[  F13S2ZZ1. 

cr5[  FLDSI2Z  ]/  crSCFLDSXZI],  cr  7[  ?X2S ZZ Z Jt 

crS[  FLDSIZ2  ]; 


struct  type3  ( 

char  baaae(9]; 

iat 

bsf  PI0SXZ2  ]/ 
bd[ FIDSIZ2], 


}: 


ba[  FT.DSI Z2  ]; 


struct  type 


char 

vnaaef  9 ]; 

int 

▼a[ ?IDSZX3  ], 

Td1[  FL35IZ2], 

7d2[ FiSStZS], 

vdar  F1CSIZ2  ], 

vd5[ riDsrzr  ], 

rd’r  TL0SIZ2  ], 

Td8[?i:srz2], 

7s(  nusizz  ], 

vdr-isszzz]. 

’r 1 [ 713SIZZ ] ; 

}: 


Td2r  f ZOSZZZ  1, 
vi£[  ?T~os:z2 


struct  types  ( 

char  pnaue[9  1; 
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int  03i[TLDSIZ2],  pdfTLDSIcr  ], 

ps(  FT.DSX22  ],  ?d1[  ?L05  ZZZ  ]; 

} : 

char  "fieldsf  ] ( 


"ar 1 ", 

"ar2". 

"an". 

"ad". 

"as" , 

"aa  " , 

"as" , 

"ad". 

"ca". 

"es" , 

"crl". 

"cr2" , 

"cr3",  "crft",  5 "cr5",  "cr5" 

"cr7". 

"cr8". 

"bs". 

"hd". 

"ha" , 

"73", 

"vdl". 

"vd2". 

"vd3". 

"vdo ",  "vd  5 ",  "vd5 " , "vd7" 

"Yd*", 

"vs". 

"vd". 

"vr 1”, 

"pa"  , 

0 

"pd". 

"ps". 

"pd  1"  , 

}: 


int  displace^  ] { 

0,  1 , 2 , 3 , ft, 

C,  1,  2, 

0,  1 , 2,  3,  ft,  5,  5,  7, 

a,  9, 

o,  1,  2, 

0,  1,  2,  3,  ft,  5,  6,  7, 

3,  9,  TO,  11, 

0,  1,  2,  3, 

0 

}: 


int  fldtypef  ] { 

1,  1*.  1,  2,  2, 

1,  2,  3, 

1,  2,  3,  3,  3,  3,  3,  3, 

3,  3, 

2,  3,  3, 

1,  1,  1.  1.  1,  1,  1,  1, 

1,  2,  2,  2, 

1,  1,  2,  2, 

0 

}; 


char  »f/pef  MtT=C3rrs]  ( 
"ALU", 

"SZ 8", 

"CPU", 

"BUS", 

"7SD", 

"PZR" 


char  *erraesg[  ] [ 

"illegal  nane. field  foraat", 

"unr ecognired  field  na:e", 

"[  > ] to hen  expected" , 

"inconsistent  unit-type  specif icat ion" , 

"output  field  expected  before  [ > ]" , 

"outputs  previously  defined", 

"input  field  expected  aftar  f > , 

"unit  quota  filled  - see  define  file  for  quotas", 
"illegal  iatra-ur.it  connection". 
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"unexpected  eof  encountered", 

"unidentifiable  unit-na.te", 

"invalid  deliaitec  or  inproper  bit  specification  foroat", 
"inproper  bit  specification  fornat — expecting  bit  nunber", 
"invalid  bit  nuober — > 31  or  < 0 or  non-nuner  ic" , 

"warning:  illegal  freq  specif— -expecting  '/*,  '/'  assumed", 

"invalid  frequency  specification-expecting  positive  integer*', 
"invalid  frequency  spe cification— operand  aissing", 

"invalid  delioitar — expecting 

"unexpected  eof  or  ' ;*  --aissing  paraneter  or  o’oeraad", 
"invalid  paraaeter — expecting  'I*  if  any  should  be  present", 
"invalid  I/O  paraaeter — expecting  ’I'  or  *0'  ", 

"iaaediate  operand  value  exceeds  range", 

"invalid  character  or  bad  naae  foraat", 

"unrecognizable  instruction", 

"invalid  boundary  specifications  for  aesory  duap", 

"branch  to  nonexistent  label", 

"incorrect  or  aissing  unit  nane", 

"operand  for  DE  instruction  aust  be  positive  integer", 

"invalid  pseudo  op", 

"illegal  condition  cods", 

0 

}: 


struct 

buf  di 

.sfcbuf,  *iobuf 

» 

struct 

typeO 

alu[  i*.  AIAIO  ], 

*alu  ptr. 

"alutontr 

struct 

typai 

aea[  BAXMS.I  ], 

"aeaptr. 

*se=t optr 

struct 

type! 

cpu[ SAXC3U ] , 

*c?u  ptr. 

*cputo.p  tr 

struct 

fvpe3 

bus[  SA'C30S], 

"bus  ptr. 

•bcstop  tr 

struct 

typeu 

vsd[2AX7SD  ], 

*vsd  ptr. 

* vsdtop  tr 

ft 

struct 

type; 

per( KAIP3S], 

"per  ptr. 

•pertop  tr 

char  line!  30  ],  *lineptr,  phase  Ifilef 72 ],  unitr.a:[?],  fldnanfr]; 
char  chasa2f ile[  72 ]: 
char  f ault  f ile[ 72  ] ; 

iat  *arc[l00],  *arc?tr,  topiadexf  rJ.tnsiTS  ],  ucitiadex,  fldisp, 
oldfldisp,  sequence,  fldindex,  baseaddr,  = aaory[  3132  ]; 
char  *inst(  ] [ 


L 


"AD"  , 

/" 

0 

2-op  add"/ 

"S3"  , 

/» 

1 

" subtract  */ 

w t*  rr 

/* 

2 

" aultiply  */ 

”D"  , 

/* 

3 

" divide  */ 

"1!I"  , 

/■* 

H 

" a nd  */ 

"03"  , 

/* 

5 

"or  */ 

"13"  , 

/* 

6 

" e xclusive-or*/ 

"37"  , 

/* 

7 

"*  a eve  •/ 

"C"  , 

/* 

3 

" ccapare  */ 

"LDK", 

/* 

9 

" load  register  froo  aeaory  */ 

"srr.",. 

/* 

10 

" store  frea  register  to  senary  »/ 

"ADI", 

/" 

11 

iaaediate- value  add  */ 

"S3I", 

/* 

12 

" V 

mw  r rt 

/* 

13 

" aultialy  «/ 

"DI"  , 

/* 

is 

" divide  */ 

"AKI", 

/" 

IS 

" and  */ 

"OHI", 

/* 

15 

" or  */ 

"X2I", 

/* 

17 

" exclusive  or  */ 

"LI"  , 

/* 

13 

" load  */ 

"Cl"  , 

/* 

19 

" coapare  "/ 

"SLL", 

/• 

20 

shift  left  logical  irsediare  value  */ 

"S?.L  " , 

/* 

21 

shift  right  logical  iaaediate  value  ■/ 

"SLA", 

/" 

22 

shift  left  arithmetic  iaaediate  value*/ 

"S?.A", 

/* 

23 

shift  right  arithmetic  iaaediate  value  */ 

"3"  , 

/* 

23 

branch  unconditionally  */ 

"3C " , 

/• 

25 

branch  conditionally  •»/ 

IHIS  PAGE  IS  BEST  QUAE  TTY  PRACTICABLE 
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"3SA", 

/» 

26 

branch  and  save  return  address  */ 

"HLT", 

/" 

27 

halt  •/ 

"227" , 

/* 

29 

return  •/ 

"MSG", 

/* 

29 

2's  coiplenent  •/ 

"MOT" , 

/■ 

30 

I's  corplenent  =*/ 

"SQ27" , 

/* 

31 

square  root  •/ 

"ABS", 

/* 

32 

absolute  value  */ 

"CL"  , 

/* 

33 

clear  */ 

"?A3", 

/* 

3d 

check  parity  and  set  parity  bit  •/ 

"ADO", 

/* 

35 

unit  add  (alu)  */ 

"SBO", 

/• 

36 

unit  subtract  (alu)  •/ 

"IPO", 

/« 

37 

unit  aultiply  (alu)  •/ 

"DO"  , 

/* 

33 

unit  divide  (alu)  •/ 

"AHC", 

/* 

39 

unit  and  (alu)  ■/ 

"OP.U", 

/* 

SO 

nait  or  (alu)  •/ 

"X30", 

/* 

BI 

unit  exclusive  or  (alu)  •/ 

"CIO", 

/* 

32 

clear  all  fields  in  unit  */ 

"CD0.1P" 

,/* 

33 

unit  d oop  */ 

"HD OS?" 

34 

neuory  dun?  »/ 

"7072" , 

/• 

35 

7sd  activation  •/ 

"SST" , 

/■ 

36 

set  unit  status  */ 

"2S7", 

/• 

47 

reset  unit  statue  */ 

"10", 

/“ 

33 

input/cutput  */ 

"3012", 

0 

/" 

89 

boob — stop  progran  */ 

/* 

char  *s 

fins 

« C ] 

( /•  special  fault  instructions 

"•  3 A 7 " , 

/* 

7 

renave  all  fault  injections  fron  f. 

"•02"  , 

/* 

2 

dead  end — add  specified  clock  unit: 

*"*SA0",  /* 

"*SA1",  /• 

"* ?. S A3  " , /* 


e7ent  in  event  queue  for  field 
designated  bits  to  be  stuck  at  0 
designated  bits  to  be  stuck  at  1 


"•2SA1",  /■» 


"*270",  /* 

"■271",  /* 

0 


designated  bits  to  be  stuck  at  0 at  specif 
interval  ■/ 

designated  bits  to  be  stuck  at  1 'at  specifi; 
interval  ■/ 

reuove  stuck  at  0 faults  for  designated  bi 
raoove  stuck  at  1 faults  far  designated  bi 


ts  */ 
ts  «/ 


*psaudos(  ] ( 
"327", 
"D27C", 

0 


r?2ir.st[  ] ( /*  instructions  requiring  prccessina  by  oass2  */ 

"3"  , 

"•3", 

"SC", 

"•SC", 

"3S  A " , 

"• 3SA" , 

"IDS", 

"•13.1" , 

"SCI", 

"•37.1", 

0 


X?.?.XZ  32GC932:  code  Rushers  for  bits  15-13  or  16-19  fo: 
regular  instructions  and  corresponding  fault  injection 
(arrays  ZM57  and  7IMST  above 
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iD  t rsgcsde[ -2  ] [ 


coo  , 

/* 

0 

*/ 

010000  , 

/* 

1 

V 

020000  , 

/* 

2 

*/ 

030000  , 

/* 

3 

-/ 

oaoooo  , 

/- 

u 

*/ 

050000  , 

/« 

5 

*/ 

060000  , 

/* 

6 

*/ 

070000  , 

/* 

7 

*/ 

0100000, 

/* 

8 

V 

0110000, 

/* 

9 

«/ 

01200C0, 

/* 

10 

V 

000  , 

/* 

11 

V 

010000  , 

/* 

12 

*/ 

020000  , 

/* 

13 

V 

030000  , 

/» 

14 

V 

ouoooo  , 

/* 

15 

*/ 

osooco  , 

/* 

16 

*/ 

C60000  , 

/- 

17 

V 

070000  , 

/* 

13 

V 

0100000, 

/* 

19 

*/ 

C110000, 

/* 

20 

•/ 

0120000, 

/* 

21 

V 

0130000, 

/* 

22 

V 

omoooo, 

/* 

23 

*/ 

000  , 

/* 

24 

*/ 

020000  , 

/- 

25 

V 

040000  , 

/* 

25 

V 

060000  , 

/* 

27 

V 

0100C00, 

/* 

23 

V 

000  , 

/' 

29 

V 

010000  , 

/' 

30 

V 

020-000  , 

/* 

31 

V 

030000  , 

/- 

32 

V 

OUOOOO  , 

/* 

33 

V 

050000  , 

/* 

34 

V 

000 

/* 

35 

V 

010000  , 

/* 

36 

V 

020000  , 

/* 

3? 

V 

030000  , 

/* 

33 

V 

CttOOCO  , 

/* 

39 

V 

050000  , 

/* 

9C 

*/ 

060 000  , 

/* 

41 

V 

070000  , 

/* 

42 

V 

0100000, 

/* 

43 

V 

011  0000, 

/* 

44 

*/ 

0120C00, 

/* 

45 

V 

013C000, 

/■ 

36 

V 

0100000, 

/* 

37 

V 

000  , 

/* 

43 

V 

020000  , 

/• 

49 

V 

tcod  e[  3 ] C 

000  , 

0100C0, 

020000, 

0200CC, 

0O00C0, 

050000, 

C60000, 

070000 

izt  ones  [ ] { /-to  sahe  the  ith  bit  of  a word 

L _ . 


equal  to  1, 


- 
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rat  ~i 


n_at 


01 00  coo , 

02  the  word 
/“  bit 

with  onesf i]  »/ 

0 »/ 

ObOOOO, 

/- 

bit 

1 V 

020000, 

/* 

bit 

2 */ 

010000, 

/* 

bit 

3 “/ 

OttOOC, 

/* 

bit 

d -/ 

02000, 

/» 

bit 

5 “/ 

01000, 

/“ 

bit 

6 «/ 

QUOO, 

/* 

bit 

7 »/ 

0200, 

/* 

bit 

8 “/ 

0100, 

/- 

bit 

9 V 

0«0, 

/* 

bit 

10  »/ 

020, 

/* 

bit 

11  V 

010, 

/* 

bit 

12  «/ 

0“. 

/* 

bit 

13  “/ 

02. 

/* 

bit 

la  »/ 

01, 

0 

/* 

bit 

15  «/ 

cesC  ] { 

/“  to  aero  t 

he  ith  bit  of  a word,  ASD  the  word 

077777, 

with  : 

/* 

reroesf i ] »/ 

bit  0 V 

0137777, 

/» 

bit 

1 V 

0157777, 

/* 

bit 

2 •/ 

0167777, 

/* 

bit 

3 V 

0173777, 

/- 

bit 

« V 

0175777, 

/- 

bit 

5 V 

0176777, 

/* 

bit 

6 «/ 

0177377, 

/* 

bit 

7 */ 

0177577, 

/* 

bit 

8 “/ 

0177677, 

/* 

bit 

9 */ 

0177737, 

/* 

bit 

10  */ 

0177757, 

/* 

bit 

11  •/ 

0177767, 

/* 

bit 

12  “/ 

0177773, 

/* 

bit 

13  */ 

0177775, 

/* 

bit 

19  »/ 

0177776, 

0 

/* 

bit 

15  */ 

tea?. 

errors. 

/*  keep  t 

rtcfc 

of  PHASZ  I errors  */ 

fid  id,  / 

* displace 

tent 

within  structure  of  given  field  •/ 

count,  / 

* teaporar 

y counting  variable  “/ 

aeaccunt 

, /*  reiat 

i 

address  of  word  of  aeaory  being  oointed 

iastnua. 

to  by 
/“  row  r.u 

proaaddr  */ 

aber  in  array  where  instruction  is  found  «/ 

iat 


syaaddr^  ST"SI2Z  ],  /’  address  a f label  in  corresponding  rev 

o i syatab  •/ 

cond,  /*  whore  condition  code  frea  cond.  branch  is  placed*/ 
dua  ey; 


*p2addrC  ?fff”3HAffC5t  J,  /•  odp  addresses  of  branch  instructions  • 

asseably  code  */ 

»“p2artptr,  /*  pointer  to  p2addr, points  to  next  unused  eleneat*/ 
»*p2eadacdr,  /»  largest  value  of  »“p2adotr  */ 

•asynptr,  /"  pointer  tc  syaaddr  */ 

“seastart,  /*  POP  address  where  progma  binary  code  begins  */ 
•progbeg,  /«  address  of  beginning  of  asseabled  progran  */ 
“progend,  /“  ?D?  address  where  progran  binary  code  ends  */ 
•progaddr;/*  pointer  tc  line  in  neoory  where  coi»  is  currently 
being  placed  cr  points  to  next  ar.used  word  at 
beginning  of  rev  instruction  */ 

iat  “spare;  /“this  pointer  is  cci.tg  to  aata  everything  better  — eib*/ 
char  un it nar.e[  ^ , /*ar.itnaoe  will  be  placed  here"/ 


r 
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f ieldnase{ 9 ] , /*  fieidnane  vill  be  placed  hers  */ 
xeaload[9],  /"  nane  of  neoory  where  progrin  will  he  leaded*/ 
syatahf  SISSIES  ][  9 ],  /*  synboi  table  */ 
nane[9],  /•  vhere  a r.a  is  fzaa  input  is  placed  before 
being  deciphered  »/ 

c;  /*  tenporary  storage  of  a character  */ 
char  "alptr,  /*  ptr  to  nane  of  aesory  read  in  to  aeaload  * / 

"synptr,  /*  pointer  to  next  blank  space  in  synfcol  table  */ 
"synend,  /*  pointer  to  end  of  synboi  table  */ 

•naoeptr,  /" pointer  tc  string  naae  */ 

"cptr;  /*  teepory  pointer  to  character  »/ 

int  nunf ialdsf 6 1 { 

5,  /*  alu  */ 

3,  /*  aeo  */ 

10,  /*  epu  */ 

3,  /"  bus  */ 

12,  /*  7sd  «/ 

4 /*  peripheral  */ 


PHASE  3 DECT. A?.  A? TOSS 


int 

faaltbit,  /*  = 1 if  inst  being  decoded  is  fault  injection, 

othervisa  =0  »/ 

cptype,  /*  operation  type  of  instruction  being  decoded  "/ 
cpndl,  /"operand  1 cf  instruction  being  decoded  * / 

epnd2,  /"operand  2 cf  instruction  heir.g  decoded  */ 

opspec,  /"operation  specification  within  type  for 

instruction  being  decoded  */ 

onittype, 

condcode,  /»  condition  code  register  */ 

"tptr,  /*  terporary  pciater  »/ 

initialization,  /*  if  1,  indicates  interpreting  the 
fault  initialization . file,  if  0, 
regular  execution  */ 

dua; 

int  tvec[2]; 
long 

loagone,  /*  long  one  used  in  assigning  value  1 to  longs"/ 
lcond, 

coadtine,  /■  tine  +ield  for  condit  coda  */ 

■tine. 

It,  /"  long  zero  to  use  in  passing  to  subroutines  »/ 
fintiae,  /*  conpletior.  tine  of  operation  being  decoded  */ 
fltine,  /*  the  tiae  a fault  initialization  icst  in  the 
fault  initialization  file  is  to  occur"/ 
clock;  /*  universal  clock  for  all  operations*/ 
extern  long  seed; 

int  "regsa 7e[ 10 /•  stack  for  branch  and  save  */ 
rscour.t;  /"subscript  for  ragsa7e  »/ 

int 

*?a;  /*  pointer  to  current  line  of  cade  being  decoded 
to  be  placed  on  event  queue  */ 

struct  ncde[ 

long  tine; 

int  type; 

int  spec; 

char  "addl; 

char  »add2; 

int  va 1; 

long  aask; 

float  freq; 

struct  node  "praT; 
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J 


L 


=3 

1 


•next; 

used; 


struct  node 
int 
): 

struct  node 

•head,  /”  pointer  to  beginning  of  event  queue  */ 

•tail,  /•  pointer  to  end  of  event  queue  */ 

*p;  /•  pointer  to  nova  down  event  queue  •/ 

struct  node 

tailnode , 
headnoda ; 

struct  node  nodesf  NtJMNCDZS  ];  /•  unused  nodes  to  be  used  for  creation 

of  nodes  for  event  queue  •/ 

/* 
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V 

aainQ  C 


nainl  ()  ; 
if  (errors)  { 

priatf ("error  detected  in  ?HAS2  I - abortingcn") ; 
goto  er.dcrcg; 

} 

else 

printf ("rncnAll  set  to  proceed  vith  PHASS  XIcnon") ; 

sain2  0 ; 
if  (errors) 

printf  ("sn*""”"'d  assembler  errors,  phase  3 skipped”, 
errors) ; 

else  printf  ("an !!!!!! no  assenbler  errors", 

" phase  3 initiatiated") ; 
progend  = progaddr; 
print;  ("cnneaory  values  are  an")  ; 
progaddr  = nenstart; 

printf  ("sn  progend  = 5d,  nenstart  = *dsn",  progend, reastart)  ; 
while  (progaddr  < progend) 

{ ' printf  ("  5o  Seen", tr eg ad dr [ 0],?rogadir[  ' ])  ; 

progaddr  =♦  2; 

} 


print;  ("nr.  or  on 
pa  = progbag; 

3a in 3 ()  ; 
end trog:  ; 

) 

aainl  ()  { 

char  *structptr, 
oatype ; 


P3ASI  2 HEX 2 =n=n")  ; 


/•  general  purpose  structcra  pointer  •/ 
/•  oat  put  unit  type  variable  •/ 


int 


conaa,  "ptr, 
catindes; 


/*  general  purpose  integer  variables  •/ 
/•  structure  index  of  the  output  unit  •/ 


printf  ("rnrnctctir  20"?  ? ACL?  IJUIC7I0N  - 271!;?  D?27t;{  S2.“.CLA?C?=n=nn)  ; 
print;  ("r!!A3I  T - STS? r.“.  DISCS IP?ro:Jcnca=n ”)  ; 


alutoptr^  alu; 
aeatoptr-  n° n ; 
cputoptr=  epu : 
bustoptr=  bus; 
vsdtoptr'3  vsd; 
pertoptr=  per; 

iobu;-  Cdiskbuf; 
arc?tr=  arc; 


/•*  initialize  to?  pointers  */ 


/*  initialize  tasks  •/ 
B-S4 


goto  erroff; 
} 


if  (fldtype[  fldindex  ] — 1)  ( /«  oatout  fid?  »/ 

error  (b); 
goto  erroff; 

} 


sequence^  1;  /*  enable  error  sesgs  »/ 


switch  £i=*  f Indian®  ( nr.itaaa  ))  [ 

case  0:  /*  new-  unit  «/ 

if  (enteraait  (uaitr.aa,  fldaao)  — 0)  £ 

error ( 7)  ; 
goto  erroff: 

} 

break; 

default: 

if  (fldaan[0j  !=  i)  ( 
error  (3)  ; 
goto  erroff; 

} 

if  (*  (structpt r=  naseacdr  * fid isp  * ?C Z7i~:Z~)  !=  0)  £ 

error  (5)  ; /*  defined  before  * / 

goto  erroff; 

} 

) 


/*  old  unit  ■/ 

/*  consistent?  */ 


* (struct  ptr=  baseaddr  * fldisp  ♦ POlh'TI?.)  = arcptr; 
outindex=  unitiade.t; 
oaty?e=  fldnas[."]; 


/*  field  address  */ 


for  (ccona=  0;  conca  ==  0 I!  getto'<())  ==  ccoaa**)  £ 

if  ( (n=  cetfield(  uaitnaa,  fldnan,  Cfldisp}}  ==  -1)  goto  eof; 
else  if  (a  = = C)  goto  erroff; 


Lf  (fldtT?e£  f liindex  } ==  2)  { 

error  (6)  ; 
goto  again; 

1 


/«  input  fid?  «/ 


switch  (fc-  fir.dnme  ( uaitr.aa  ))  £ 

case  3: 

if  (en terunit ( unitnaa,  flinas)  -=  0)  £ 

error  (7)  ; 


/*  defined  bi?  «/ 
/»  nope  »/ 
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goto  again; 


default: 


break; 


if  (fldnaafO]  !=  k)  { 
error  ( 3}  ; 
goto  again; 

} 


/*  yes  */ 


if  { (outindex  ==  unitindex)  SS 

{outype  — fldcaofO]))  ( 
error  (8)  ; 
goto  aoaia; 

] 


/*  looping  ? */ 


»arcptr*+=  fldnanfO]; 
*arcrtr*+=  unitiadex; 
*arcptr++=  baseaddr  ♦ fldisp; 


/*  pointers  into  arc  V 


again: ; 

} 

■a rcptr* ♦=  0; 
goto  loop; 

erroff:  /*  here  is  the  error  handling  code  */ 

sequence-  0;  /*  disable  error  aessages  -/ 

if  (getline()  ==  -1)  goto  eof;  /*  skip  the  rest  of  the  present  line  */ 

looo: ; 

} 

/»*»»».**.  H322  IS  THS  f:»D  0?  TH  a IXTZ3E22T23  • *»•«**»* 

V 
eof : 


printf  ("cnsncnCATAiOGai:  0?  SII3TI5G  airiTScaon")  ; 
printf  (" Maoert  TypectConaected  fieldscr.cn")  ; 

fidisp=  unitindex=  0;  /*  initialite  «/ 

for  (i=  0;  i < SOKCHIIS;  i*-*)  ( /*  unit  type  loop  «/ 

for  {j=  0;  j < topindex[  i ];  j*+)  [ /-‘unit  index  loop  », 

switch  (i)  ( /*  get  baseaddr  */ 

case  0 : 


baseaddr=  Saluf  j ] ; 
break; 


case  1 


case  2 : 


case  3 : 


case  i: 


case  5 : 


baseaddr=  Cneaf  j ]; 
break; 


baseaddr=  Ccpuf  * ]; 
break: 


baseaddr=  Cbusf  j j; 
break; 


baseaddr^  r-7cd£  j ]; 
break ; 


baseaddr=  Cper£  J ]; 
break ; 
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} 


printf  (" tsot  ' sc  t" , basea  ddr , type[  i ])  ; /*  print  nane  S 

baseaddr=*  10;  /■*  skip  nans  field  "/ 

oldfldisp=  fldisp; 

for  (fldinde:r=  0;  fldindex  ==  0 | | 

displace^  fldisp  ] !=  0; 
fldindex**)  ( 

if  (connected ( fldtypef  fldisp  ] )) 

printf  C'Es  " ",fieids[  fldisp  ])  ; 

* (otr=  baseaddr  + flOID) = 

(i  « 9)  ♦ (j  <<  4)  * fldindex; 
fldisp ♦+; 

baseaddr=  baseaddr  + ELCSIZZ  * 2; 

} 

fldisp=  oldfldisp; 
printf  ("-i")  ; 
unitindex** ; 

} 


for  (fldiso**;  disolaca  [ fldiso  ];  fldiso**)  ; 

} 


printf  ("rnsnenONlT  ?ALL?=n-nH)  ; 
for  (3=  i=  0 ; i < HOKOHITS ; i*+)  { 

printf  ("t.sot 'don",  type  £ i ],  to?index[  i J)  ; 
j=+  to?index[i]; 

) 

printf  ("cato talatSdrncn" , j) ; 

} 

/* 

* G 2 ? N A 3 E 


arguaents: 

returns: 


notes: 


'/ 


target  address  of  naae 
>0  sea  ns  all's  OK 

0 Deans  invalid  lead  char,  in  stress 
-1  aeans  eof  detected 

lineptr  is  left  pointing  to  the  first  char, 
after  the  last  valid  sane  char. 


getnaae  (ptr)  char 
int  n,i; 

«pt 

r:  t 

i=  Ptr; 

if  (skipspac=()  ! = 

-IT 

l 

vhile  ( ( (tv=  *Iir.ep 

tr) 

>=  ASC  0 

SS 

n <=  ASC_9) 

(ri 

== 

) Ti 

(n 

ss 

’S’  ) ! l 

(a 

>= 

ASC_ A SC 

n <= 

A5C_Z)  I ) 

(n 

ASC~a  SC 

n <= 

ASC_=)  ) ( 

lineot 

= *-*; 

('(P 

tr-  i) 

< 9)  *?  tr* 

} 

if  { ptr  — i]  return  (0);  else  [ 
•ptr-  ’=0’; 
return  (1) ; 

) 

) 

return  (-1)  ; 

} 

/• 

* G 2 ? T 0 K 
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argueneots; 

returns: 


tote: 


'/ 


none . 

the  ascii  char  of  an  accepted  token 

0 if  first  non-space  was  not  a taken 

-1  if  an  eof  was  encountered 

line  pt  r is  left  pointing  to  the  first  char 

after  an  accepted  token  or  at  the  first 

invalid  char. 


gettok  ()  { 

if  (skipspace  ()  !=  -1} 

switch  (“lineptr**)  { 
case  ' >•  : 
case  ' 
case  • . • : 
default: 

1; 

return  (- 1)  ; 

} 

/* 

» G 3 ? 


return  (’  >’ ) ; 
return  (',’); 
return  {• . ’ ) ; 

lineptr--;  return (3)  ; 


1 I .»  2 


•/ 


ar  gueaer.ts: 
returns: 


notes: 


none. 

zero  if  all’s  CS 
-1  on  an  eof  condition 

a line  of  data  froa  the  input  file  is  placed 
in  the  line  vector,  lineptr  is  left  pointing 
to  the  beginning  of  the  line. 


getline()  [ , 

for  (lineptr=  line;  (“lineptr=  putchar (getc  (iobuf) ) ) !=  -1  CS 

“lineptr  !=  ’an*;  lice ptr++) ; /*  input,  print,  S stare  input  data  */ 


if  (“lineptr  =-  -1)  ( /*  eof  condition?  “/ 

if  (lineptr  line) 
error  (9)  ; 
return  (- 1)  ; 

} 

lineptr^  line; 
return  (0)  ; 

] 

/* 

* S.KIESPAC2 


/ 


arguenents: 

returns: 

notes: 


cone 

0 if  all’s  cool 
-1  on  an  eof 

lineptr  is  left  pointing  to  the  first  char  not 
a on  , tab,  or  a space. 


skipspace()  { 

vhile  ( 1 ) switch  ( “lineptr  ) { 

case  SPACE: : 
case  TAB: 

lineptr**; 
braa  k; 
case  »’cn’ : 

if  (get line  {)  = 
brea  k ; 

default: 

return  (0)  ; 

} 

} 


-1)  return  (-1)  ; 
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G2TFI2X.E 

1)  target  address  far  unit  sane 

2)  target  address  for  field  naze 

3)  target  address  for  field  disolaceoer.t 
1 of  all's  cool 

0 if  an  error  vas encountered 
-1  if  an  eof  vas  encountered  : 

the  expected  foraat  is  [unitnaae]  [.]  [fieldnaae] 
if  the  foraat  does  not  confora  to  that  of  the 
input  streaa,  one  of  the  foraat  errors  aessages 
is  printed. 


getfield  (ucitptr,  fldptr,  add r)  char  ’ur.itptr,  *fldptr;  iat  *addr;  { 
iat  n; 

if  ( (n=  getnaae  (unitptr) ) -1)  return(-l);  /*  get  unit  r.aae  */ 

else  if  (a  ==  0)  [ 

error  (0)  ; 
return (0} ; 

} 

if  ( (n=  gettokO)  ==  -1)  retura(-l);  /*  get  tofcer.  */ 

else  if  (n  !=  { 

error  (0)  ; 
return  (0)  ; 

} 

if  ( (n=  getnaae  (fldotr))  ==  -1)  return.  (- 1)  ; /*  get  field  nane  »/ 

else  if  (n  ==  0)  ( 

error  (0)  ; 
return  (0)  ; 

} 


if  ((fldindex=  f indfieid  ( fldptr  ))  ==  -1) 
error  (1)  : 
return  (0 ) ; 

} 


val id  fid  nazs?  */ 


*addr=  10  + displacer  fldindex  ] *2  *n.3SI3;; 
return  p) ; 

] 

/* 

* a R 2 C H 


argueaen  ts: 
returns: 
no  te  s : 


index  nuaber  of  the  error  r.essage  to  print 

zero 

none. 


error  (index)  iat  index;  { 
iat  i ; 

if  (sequence)  ( /»  error  reporting  flag?  */ 

for  (i=  0;  i < (lineptr-  line-  1);  i++)  printf  p 
priatf ("»rn") ; 

printf  ("’■E22C3:  'sonoaon",  erraesgf  index  ])  ; 
errors** ; 

} 

} 

/* 

• 0 ? ’ !l  ? I I 2 

M 

• arguenents:  target  address  for  filenaae 

• returns:  zero 


”)  ; 
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routine  inputs  a file  naae  fro n the  tern: 
and  atteaots  to  access  it. 


opeafiie  (s)  char  «s;  ( 

int  1; 
char  *ptr; 

for  (1=  -1 ; ptr  ==  s U 1 = -1;)  £ 

printf  ("onlnput  file?  ") ; 
pt—  s; 

vhile  £{»ptr=  getchar () ) !=  *nn’)  ptr*+; 

•ptr®  1 cO  • ; 

if  (£1=  fopen(s,  iobuf))  ==  -1) 

printf  ("*2?.3CP. : Ss  bad  file  naae::n",s5; 

} 

rintf  ("cncn") ; 


arguenents: 
retu  ms: 


notes: 


7IUD5AH2 

address  of  the  nane  to  be  found 
an  ascii  value  of  the  first  letter  of  the 
unit-type  (a,  c,  a,  ?,  v,  or  b)  if 
the  naae  vas  found 
zero  if  the  naae  vas  not  found 

in  the  case  vhere  the  naae  vas  found,  the  global 
variables  cnitinder  and  baseaddr  are  set. 


findnase  [si]  char  *s1;  £ 

int  i,n; 

for  (i=  0;  i < MUaBHITS;  i++)  { 

n=  0; 

switch  (1)  [ 

case  0 : 

for  (aluptr=  alu;  aluptr  < alutootr;  alu: 
if  (coanaref  si,  aluptr))  [ 
unitinder=  n ; 
baseaddr®  aluptr; 
return  £ 'a' ) ; 

3 

a** ; 

) 

break; 

case  1 : 

for  ( a e n p t r=  aen;  aeaptr  < neatoptr;  sen: 
if  {eoapare(  si,  aenptr))  { 
ucitinder®  n; 
baseaddr®  aeaptr; 
return  ( 1 a ! ) ; 

} 

n++ ; 

) 

break; 

case  2: 

for  (cpuptr®  cpu;  cpuptr  < cputoptr;  cpu: 
if  (coapare  ( si,  cpuptr))  ( 
unitir.d  e r=  n; 
baseaddr'®  cpuptr; 
return  ( ' c ' ) ; 

} 

n-»  ♦ ; 

] 

B-^n 


) 

retard  (0) 

} 

/* 


ciss  3: 


case  5: 


casa  5: 


_h(vc  IS  5,0  10  'JOG 

^^rs**0* 


, . ot.TI 

4 


yROit  ( 


break; 


tor  (busptr-=  bus;  busotr  < bustoptr;  busptr++)  ( 
if  (coapare(  s1,busptr})  ( 
nnitinde.r=  n; 
haseaddr=  busptr; 
return  (’b1)  ; 

} 

n>+; 

} 

break; 

for  (vsdptr=  7sd;  vsdptr  < 7sdto?tr;  7sdptr++)  ( 
if  (cospare ( si,  TSdptz))  { 
unitir.dex=  a; 
baseaddr=  7sd?cr; 
return  ( ) ; 

J 

n++; 

} 

break; 

for  (perptr-=  per;  perptr  < pertootr;  perptr  + + ) { 

if  (cocpare(  si,  perptr))  ( 
unitindex=  n; 
baseaddr=  perptr; 
retura  ( *p*  j ; 

3 

n + + ; 

] 

break; 


?nT333SI? 


* argueaents:  1)  address  cf  r.ase  to  be  entered 

* 2)  unit  type  (a,  b,  c,  a,  p,  or  v) 

* returns:  unit  type 

* notes:  use  tie  subroutine  to  enter  a new  unit  into 

* a structure. 

V 

eaterunit  (si,  s2)  char  *s1,  *s2;  ( 

switch  (*s2)  ( 

case  • a 1 : 

if  (topindaxfO]  ==  SAXALU)  retarn(O);.  /*  unit  quota  */ 
baseaddr=  alutoptr; 

stringxf er ( si,  alutoptr*+) ; /•*  enter  naae  field  »/ 

unit index=  tspindsx[  0 ]+* ; 
return  (’ a' ) ; 
casa  * a*  : 

if  (topindex[ 1 ] — SAXhr”)  return  (0); 

aeatcptr->pcpbase=  Soeaory[  to?index[ 1 ]«  2*  .1 2MS22 Z ]; 
baseaddr=  aestcotr; 
stringxf er ( si,  neatootr**) ; 
unitindex=  tap  ir.dexf  1 ]*■*■ ; 
return  (• a* ) ; 
case  'c* : 


if  (topindexf  2 ] ==  «AXC?0)  return (0); 
baseaddr=  cputcptr; 
stringxf er ( si , cputcotr ♦*) ; 
nnitindex-=  tap ir.dex[  2 ]♦» ; 
return  (• c’ ) ; 


case 


tSSTJBBZU 
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} 

/" 


*/ 


if  (topindexf 3 ] ==  KAXPSS)  return (0); 
baseaddr-  bustcptr; 
stringxfer ( si,  bustoptr-+) ; 
unitindex=  topindex[ 3 ]+♦ ; 
return  {' b1 ) ; 
case  » v 1 : 

if  (topiadexf tt  ] ==  SAX7SD)  return (C) ; 
baseaddr=  vsdtcptr; 
stringxfer ( si,  7sdto?tr*+)  ; 
uaitindex=  tooindex[ t ]+* ; 
return  (••/')  ; 
case  • p’  : 

if  (topindex[5]  ==  fUXPiS)  return  (0); 
baseaddr=  pertcptr; 
stringxfer  (si , perto?tr++) ; 
uaitinder=  topir.der[  5 ]+-*■ ; 
return  {• p') ; 

default: 

return  (0)  ; 

] 


c 

O X S 2 C 

T 

2 D 

argueuents: 

r 

ield  tyre 

(*= 

input,  2=output 

U> 

II 

H* 

O 

returns: 

1 

if  field 

has 

been  connected 

0 

if  field 

has 

no  connections 

(god  forbid) 

connected  (fldtyp)  int  fldtyp;  { 
iat  -ptr; 


switch  ( 


} 

/* 

* 

V 


fldty?e(  fldisp  ] ) ( 

case  i;  return  (natch (b aseaddr) ) ; /*  input  field  •/ 

case  2:  return (3  (?tr=  taseaddr  + P0I5T22))  ; /*  output  field  */ 
case  3:  if  (-(ptr=  baseaddr  *•  P3ISTI2) ) return  (1)  ; /*  io  field 

return (natch ( baseadcr  )); 

} 


£3 ill  02  ST3CX  3C33CGTIX2S 


V 


natch (s)  int  s;  ( /*  looks  for  s in  arc  (looks  for  connection)  */ 

int  -ptr;  /»  returns  0 if  none,  1 if  present  */ 

ptr=  arc; 
ptr  = *2 ; 

while  (ptr  < arcptr)  ( 

if  (*ptr*+  ==  s)  return(i); 

else  if  (-ptr  *=  0)  ptr=*  3; 
else  ntr~*  2; 

} 

return  (0) ; 

} 


stringxfer  ( si,  s2)  char  -si,  *s2;  ( /-  (si)  ->  (s2)  -/ 

while  ( (*s2=  -si**)  ! = '=0') 

s2*+ ; 

) 

findfieid  (si)  char  -si;  ( /»  leeks  for  (si)  in  field  table  */ 

int  i ; 

for  (i=  0;  fialdsf  11  !=  0;  i“) 

if  ( ccn?are(  fields(i],  si))  return  (i); 
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return  (- 1)  ; 

} 


coapare{  si,  s2)  /*  coapare  (si)  vith  (s2)  */ 

char  *s1,  *s2;  ( /*  returns  1 if  saae,  0 if  differ  */ 

while  ( *s1+*  ~ *s2) 

if  { *s2*  + — ’ =0’) 
return  (1)  ; 

return  (0)  ; 

} 

/• 

******  SAIH2  ****** 
arguaents:  cone 

returns:  1 if  everything  is  OS 

-1  if  uncorrectahle  error 

cotes:  aainline  prcgraa  for  phase  2 of  prograa 

*/ 

aaia2  () 

( iat  i,  j,  k,  1,  a,  a; 

printf  ("an cam  PHASE  ZaO;  ASSI-SLEH  =a"); 

sequence  = 1 ; 
initq  ()  ; 
printq  (head)  ; 
initfaults  {)  ; 

/*  get  naae  of  aeaory  where  prcgraa  will  be  loaded,  check 
validity  (i_e.  if  it  is  a aeaory  unit  */ 
printf ("snaaSaae  of  aevory  where  prograa  will  he  loaded?rn") ; 
getaeanaae  ()  ; 
count  = 0; 

while  ( ♦♦count  <=  2) 

( if  (fiadaaae (aeaload)  = 'a')  break;  /*  all’s  OS  */ 

priatf ("oalava lid  aeaory  naae.  He type  aeaory  saaesa") ; 
getaeacaae  ()  ; 

if  ( couat  > 3)  /*  no  correct  aeaory  naae  ia  3 tries  * / 

{ pcin tf ("Check  aeaory  naaes  froa  phase  Z an", 

"*’***  PH0G3A2  TEHtlSACIO  »*— *rn")  ; 
return  (-1)  ; 

} 

/*  initialize  prograa  address  to  beginning  of  giver,  aeaory 
and  initialize  counting  variables,  syatac  entries  =/ 
progaddr  = aenf soitiad ex ]. pdpbase; 
progbeg  = progaddr; 
syaptr  = syatabfO]; 
asygptr  = syaaddr; 
aeastart  = progaddr; 
naaeptr  = naae; 

p2adptr  = p2addr; 

/*  *****  PASS  1 *--*-  */ 

cpenfile  (Sphase2 f ile) ; 
whilst  (getlir.e  ()  ! = -1  ) ) 

( 

teaccunt  = progaddr  - aeastart; 
if  ( (tea?  = g etnaae  (naae)  ) <-  0) 

( error  ( 22)  ; 

goto  plioopbot;  /*  skip  rest  of  line,  go 
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) 

if 

[ 


} 


to  bottoa  of  while  loop”/ 

(•lir.eptr  = = ':')  /*  is  a label  •/ 

copy  (r.aae,  syaptr); 
syaptr  = ♦ 9; 

•syaptr  = 0; 

*asyaptr++  = aencount; 
licept r** ; 

if  ( getnaae  (aaae)  <=  0 ) 
t error  (22): 

goto  pllcopbot; 

} 


/*  naae  should  row  be  an  instruction  »/ 
if(*naaeptr  ==  ‘S')  /*  ? S2C3C3  «/ 

( if  ( (te sp=f  indstr  (pseudos,£naae[  ’•  ]) ) <0)  error  (23)  ; 

else 


3 

else 


switch  (t  ea  n) 

C 

case  0:  /*  22?  */ 

pseudo ( 1C)  ; 
break; 

case  1;  /*  D2?0  */ 

pseudo  (3)  ; 
break; 


if  (*naaeptr  ==  ’**)  /»  fault  injection 

{ *progaddr  =|  0100000; 

ccdefltia  j ()  ; 

3 

else 


V 


{ 


3 


/*  regular  (non-fault  injection)  instruction*/ 
if(  (instnua  = f indstr ( inst ,nane)  ) 

> 0)  cod einst (inst nun)  : 
else  /*  can't  find  instruction  */ 
error  (22) ; 


pllooobot:  ; 

3 

synend  = syaptr; 
syaptr  = syntab; 
asyaptr  = synaddr; 

/*  print  synbol  table  and  associated  address  */ 
printf ("  ST320L  TA3I2cn") ; 

printf  ("  label  addressan”) ; 

while (svaotr  < svnend) 

t 

printf (”  5s  56ocn",  syaptr , *asyncrr)  ; 

syaptr  *»  + 3; 

asyaptr**; 

3 


/*  "***  ?.\3  3 2 «•«  **  */ 

prir.tf  ("beg*?So  end*56o  an",  syatab,  pladptr)  ; 

p2andaddr  = p2adptr; 

p2adptr  = ?2addr; 

open  file  (r-?hase2  f ilej  ; 

while  ( get  line  ()  !=  -1 ) 

l 
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for  (i=0;  arr'i]  !=  0;  i + *)  ( 

for  ( j=0  ; (r  = arrj  i j 1)  ==  str[  j ] 5S 
r !=  ’c0  • ; j**)  ; 

if  (r  = strCj]) 

return  (i)  ; 

} 

return  (-1); 


LA32I5ZHC  •***•« 


argucents:  none 

returns:  rov  rusher  ir.  syutab  where  label  is  found 

if  the  label  was  found  in  syatah 
-1  if  the  label  was  not  found 
notes:  searches  for  naae  (that  character  string) 

in  syntol  -table 


labelf  ir.d  () 

{ char  r ; 

i i ^ * 

for  { * i-=0 ; syatabCi’C0]  ■=  0;  *•♦*) 

for(j=0;  (r=systab[i][  j])  ==  ha?.e[  j ] SC 
r !—  '=0 • ; j*+  ) ; 
if  ( r ==  r.a=e[  j ])  return  (i)  ; 

} 

return  (-1)  ; 


PSI'JOJS  ***** 


argunents:  the  base  which  the  cunbers  are  to  be 

inter?  reted 
returns:  none 

notes:  gets  ascii  characters  frca  input 

and  converts  the u to  a nunber 


pseudo  (base) 
int  base; 

{ int  c,  sign; 

while  ( 1 ) 

{ if  ( (teap=spaces  ()  ) < 0) 

{ error ( 13)  ; 

return ; 

} 

if  (‘lineptr  — 

( sign  = -1; 

liaept r**; 
spaces  0 ; 

} 

else  sign  = 1; 
teap  = lineptr; 
n = 0; 

if  ( base  =10) 

( while  (* lineptr  >=  'O'  ss  ‘lineptr  <=  '?') 

n = n*10  * ‘lineptr**  - 'O’; 

1 

else 

{ while ( ‘lineptr  >=  'O'  SS  ‘lineptr  <=  '7') 

n = n‘3  * ‘lineptr**  - 'O'; 

} 


J 


• Ml 


~ WMBBOSS  "Twrrc 


j 


if  (tanp  ==  lineptr) 

£ error (15) ; 

return  (- 1)  ; 

} 

•progaddr*-*-  = 2 * sign; 
if  (spaces  ()  ==  0)  break; 
if  ( aetccaaa(O)  < 0)  break 
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r 


CCDZPLTISJ  «»**» 


argunent  s; 

returns; 

r.otas: 


*/ 


none 

none 

calls  appropriate  routines  for 
fault  injection  instructions 


codefltin j () 

I 

/*  check  for  special  fault  injection  */ 
if  { (instnun  = f indst  r (sfinst,£oans[  0 ])  ) 

>-  0)  /*  is  special  fault  injection  */ 
specfault  (ir.stnun)  ; 

else 

if  ( (instnun  = f indstr  (inst, 5nane(  1 ])  ) 

>-  0 ) code  in st (instnun)  ; 
else  /*can't  find  instruction  */ 
error (23)  ; 


} 


specie  1 


/* 


SPACES 


argunents: 

returns; 


notes; 


none 

-1  if  end  cf  line  is  encountered 

0 if  senicolon  is  encountered 

1 if  any  ether  character  is  encountered 
skips  spaces  until  hits  character  other 
than  blank  and  returns  value  correspond ing 
to  last  character 


V 

spaces  () 

( 

vhile  (“lineptr  — ' • (I  “lineptr  ==  'ct')  lineptr**; 
if  (“lineptr  = 'on')  return  ( - 1) ; 
if  (“lineptr  ==  • ;*  ) return  ( C) ; 
return  (1) ; 

} 

/“ 

•»**«  BZZC0X7  ■*'”** 


argu  nents: 
returns; 


r.o 


e a: 


none 

-1  if  a nunher  is  not  found  or  the 
no sber  is  < 0 or  > 31 

the  decioal  value  if  conversion  was  c:<  and 
result  was  valid 

converts  ascii  character  pointed  to  by 
lineptr  to  its  decinil  value,  and  noves 
lineptr  to  first  ncr.-ouaeric  character 
encountered  and  checks  the  value  as  above 
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bitccnve  rt  () 

( 

iat  a; 

teap  = lineptr; 

n-0 ; 

while  (“lineptr  >=  'O’  55  "lineptr  <=  • 9 * ) 

n = n * 10  + "lineptr*-*  - 'O'; 
if  (teap  ==  lineptr  ||  n > 31) 
l error  (13)  ; 

return  { -1) ; 

} 

return  (n) ; 


CCNVS3?  *-»***» 


arguoents:  none 

returns:  0 if  a nuaber  is  not  found 

the  deciaal  value  if  conversion  was  OS 
notes:  converts  ascii  character  pointed  to  by 

lineptr  to  its  deciaai  Talue,  and  noves 
lineptr  to  first  non-nuaeric  character 
encountered,  checking  to  insure  nuaber 
is  > 0 

V 

convert  () 

l 

iat  n,  sign; 

if  ("lineptr  = '-')  sign  = -1; 
else  sign  =1; 

if  ( *liaeptr  — • + ' ||  "lineptr  = >-*) 

( lineptr*-*; 

soaces  ()  ; 

} 

teap  = lineptr; 
a 3 0; 

while  ( "lineptr  >■=  *0*  SS  "lineptr  <=  ’9*) 
n = n » 10  + "lineptr*-*  - 'O’; 
if  ( teap  ■==  lineptr) 

( error  (15); 

return  (-1)  ; 

J 

n = n » sign; 
return  (a)  ; 


»**"•*  GZ7SX0  »"*"** 


arguneats: 


returns: 


address  where  33  bit  stuck  at  fault  a ask 
will  be  placed 

0 if  last  character  processed  is 

-1  if  last  character  processed  is  r.ew  ii; 

1 if  net  one  of  the  above — prohlea  code 
gets  ascii  values  of  bit  nuabers  to  be 
stuck  at  0 and  generates  the  appropriate 
sask  placed  in  *addr  and  »(addr  + i) 


getsaO  (addr) 
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int  *addr; 

l 


•addr  = 0177777;  /- 

* (addr  + 1)  = 017777 

vhiis  (spaces  ()  > 0) 


initialize  to  all  cr.es  */ 


{ 


if  ( (teao  = b itcon.  7ert()  ) >=  0] 

{ 

if  (tenp  < 16)  =*addr  =S  reroes[ tea? ]; 
else  » (addr  + 1)  =6  reroesfteap  - 16]; 

else  return  (i)  ; /*■  bad  nnaber  »/ 

if  ( (teap  = s paces  ()  ) <=  0)  return  (tea?)  ; 

if  (’■lineptr  !=  error  (i1); 

lineptr**; 


} 

/* 


error(12) ; 
return  (1)  ; 


******  G27SA1  »*»*»» 
arguneats: 


returns; 


cotes: 


V 

getsal  (addr) 
iat  *addr;' 

( 


address  vhece  32  bit  stuck  at  fault  aask 
vill  be  placed 

0 if  last  character  processed  is  • 

-1  if  last  character  processed  is  nev  line 

1 if  act  one  of  the  above--protlen  code 
gets  ascii  values  of  bit  nuabers  to  be 
stuck  at  0 and  generates  the  appropriate 
aask  placed  in  «addr  and  ‘(addr*  1). 


*addr  =0;  /*  initialize  to  all  ones  */ 

* (addr  * 1)  = 0; 
vhile  (scaces()  > 0) 

( 

if  ( (teao  = bitconvertf)  ) >=  0) 

{ 

if  (teap  < 16)  *addr  =j  onesfteap]; 
else  « (addr  * 1)  = | cnesrtenp  - 15  ]; 

else  return  (1)  ; /*  bad  nuaber  * / 
if  ( (teap  = s paces  ()  ) <=  .1)  return  (tea?) ; 


if  ( ’'linear r 
lineptr*-* ; 


’)  error  (11)  r 


} 

/* 


error  (12)  ; 
return (1) ; 


73IQVX1 or 


argu  cents: 

returns: 

notes: 


V 


address  vhere  the  tvc  integer  values 
vill  be  placed 
none . 

gets  ascii  values  of  integers  used  in 
calculation  of  frequency  of  fault 
injection  (cacdoa  stuck  at’s) 
and  places  the  nunerator  in  addr  and  the 
deno3inator  in  addr  ♦ 1 
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fregvalue  (addr) 
int  maddr; 

{ 


if  ( spaces  ()  < 0 ) 

{ /»  nnexpected  end  of  ii-ne  */ 

error  (16) ; 
return; 

} 

if  ( •lineptr*-*-  !=  •;')  return; 
if  ( spaces  0 <=  0) 

{ * error  (16) ; 

return ; 

} 

if  ( {teap  = convert  ()  ) <=  0 ) /*  get  nuaerator  */ 

{ /*  invalid  nuater  encountered  */ 

error  (15) ; 
return; 

] 

*addr++  = teap; 
if  (spaces  ()  <=  0) 

( /*  eof  or  ; encountered  */ 

error  (15) ; 
return; 

} 

if  ( alir.eptr++  !=  '/*  ) error  (It); 
if  (spaces  ()  <=  0) 

{ error  (15) ; 

retu  rn ; 

J 

if  ( (teap  = convert  ()  ) <=  C)  /»  get  dencainatoz  */ 

( /*  invalid  nuater  encountered  */ 

error  (15) ; 
return; 

} 

*addz++  = teap; 


GST 7 ID  ID 


arguaents;  none 

returns;  -1  if  an  error  in  the  field  is  encountered 

or  an  eof  is  encountered 

0 if  the  unit  naae  cannot  be  fcand 

1 if  everything  is  OS 

notes:  this  routine  gets  the  field  ID  nuater 

of  the  string  which  begins  at  the  address 

lineptr.  At  the  end  of  the  routine,  the 
id  can  he  found  in 
"*  (baseaddr  + fldisp  * fid  id)  " 


get  fid  id  () 
[ 


if  ( (teap  = get fie Id ( Sunitnaae, Cf ieldnaae , Sfldisp)  ) < 1) 

/*  chech  for  error  or  eof  */ 

( 

if  (teap  ==  0)  error(0);  /••illegal  field  r.aae  •/ 
else  error  (9) ; 
return  (-1)  ; 

} 

if  ( find  r.aae  (Dunitr.aae)  ==  0)  /*  can't  find  ur.it.-.ase  »/ 

l 

error  CO)  ; 
return  (0)  ; 
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return  (1 ) ; 

} 

/* 


IlICASUS 


jHOil  FUKUliHJW 


SP2C7A0I.1 


arguments: 

returns: 

notes: 


nuaber  cf  rev  ia  array  sfinst  the  instruction 

nceoonic  is  found 

none. 

routine  to  generate  code  for  special  fault 
injection  iastructioas- 


V 

specfault  (nua) 

1st  nua; 

{ 

“progaddr  =|  0170000;  /“insert  code  indicating  inst  is  a 

special  fault  injection  instruction 
(in  bits  0-3)  */ 

“(progaddr  * 1)  = | faultcode  [ run];  /*  set  bits  15-13  to 
appropriate  instruction  code  */ 


/*  get  field  id  nuaber  deteroiaed  ia  phase  I,  place  ia 
bits  a- is  «/ 

if  ( getfldid  ()  <=  0)  return;  /*  bad  field  ■ / 

“progaddr  =|  * (spare=baseaddr  + fldisp  + 71010) ; 
progaddr  =*  2; 

if  ( nua  ==  0)  /“  ?.A?  */  return; 

if  ( cetcoaaa  (0)  <=  0)  return;  /»  bad  foraat  «/ 


/*  generate  renainj-ug  operands  if  present  “/ 
3'rfitch  (nua)  { 

case  (ij  : /*  D2  “/ 

if  ( (teop=con  vert  () ) < 0)  error  (27)  ; 
else 

{ “progaddr  = teap; 

progaddr  = ♦ 2; 

3 

break; 

case  (2)  : /“  *SA0  »/ 

getsaO  (progadd  ;)  ; 
progaddr  =*  2; 
break; 

case  (3)  : /“  “SA  1 */ 

getsal  (progaddr)  ; 
progaddr  =»  2; 
break; 

case  (tt)  : /*  “2SA0  “/ 

getsaO  (progaddr) ; 
progaddr  =*  2; 
freqvalue  (progaddr)  ; 
progaddr  = * 2; 
break; 

case  (5)  : /•  *?SA1  •/ 

getsal  (progaddr)  ; 
progaddr  = ♦ 2; 
freqvalue  (progaddr)  ; 
progaddr  =*  2; 
break; 

case  (5)  : /“  *370  »/ 

getsal  (progaddr;  ; 
progaddr  =*  2; 
break; 

case  (7)  : /•  *??i  «/ 
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getsaO (p rogadd  :)  ; 
progaddr  =*  2; 
breafc; 

} 

} 

/* 


*G2: 


arguaects: 


returns : 


notes: 


-1  if  ccaaa  optional 
0 if  ccaaa  required 
-1  if  error  encountered 
0 if  optional  coaaa  is  not  present 
\ if  all  is  OX 

ships  spaces  until  finds  a coaaa 
and  then  ships  spaces,  leading  lir.eptr 
pointicg  to  the  first  noa-blant  character 
which  is  not  an  end  of  line  or  • 


getcoa  aa  (o  ? t) 
int  oot; 

( 

if  (spaces  0 <=  0) 

( if  (opt  ==  1)  return (0)  ;/*  optional  ccaaa  not  present*/ 

error (18)  ; 
return  (- 1)  ; 

) 

if  (*lineptr++  != 

{ error  (17) ; 

return  (- 1)  ; 

1 

if  (spaces  ()  <•=  0) 

( error  (13)  ; 

return  (-1) ; 

reton  (1)  ; 


r* 


*****  LA3C7?52f  **”* 


arguaents:  none 

returns:  -1  if  error 

0 othe  rvise 

notes:  checfcs  to  see  if  there  is  an  offset  following 

the  label  (i.e.,  plus  or  ainus  a constant) 

V 

laboff  set  () 

( if  (spaces  ()  <=  0)  raturn(O); 

if  { *iineptr  = ',')  return  (0)  ; 
if  ("lir.eptr  !=  C5  • lir.eptr  !=  ’-*) 

( error  (25)  ; 

return  (- 1)  ; 

} 

lineptr**; 

if  (spaces  ()  <=  3)  retum(-l); 
convert  ()  ; 


*■•»■*•■*  Tjij*  ~ p ****** 
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arguner.ts:  row  nuaber  of  array  List  where  the  instruction 

is  found 

returns':  none 

notes:  routine  to  generate  code  for  2-opera.-. d 

■>  nstru  ctions 

*/ 

iaa _o?  (i.iua) 
int  ir.un; 
l 

/*  assign  proper  bit  rallies  to  bits  7-3  */ 

•(prcgaddr)  =|  030000; 

/•  set.  bits  16-19  »/ 

• (prcgaddr  ♦ 1)  =1  reg code[ inua  ]; 

/*  get  field  id  nuaber  and  place  in  bits  a- IS  */ 

if  ( getf ldid  ()  <=  0)  return;  /•  bad  nane,  skip  rest  of  line*/ 

•progaddr**  =(  * (spare=baseaddr  ♦ fldisp  + riOID) ; 

/•  get  iaaediate  -value  */ 

if  ( getcoama(O)  <=  0)  return;  /*  bad  foraat  */ 
if  ( (teop=convert  ()  ) < 0) 

{ "progaddr  =|  ones(U];  /»  set  sign  bit  */ 

teap  = abs  (tea  p)  ; 

) 

if  (teap  >=  U09S) 

( /*  nuaber  is  tee  large  */ 

progaddr** ; 
error(2i) ; 
return; 

} 

if  (teap  !=  0)  •progaddr  =1  teap; 
progaddr *+ ; 

} 

/* 

*«*»»  r wc_o?  ****** 

arguaeuts:  row  -usber  of  array  it st  where  the 

instruction  nneaenre  is  found 
returns:  none 

notes:  routine  to  generate  code  for  2-o?erand 

instructions 

V 

tvo_op (inua) 
int  inua; 

( /•  bits  1-3  are  all  zeroes  initially  as  desired 

set  bits  16-19  to  specific  instruction  type  */ 

•(prcgaddr  * 1)  =|  regeedef inua  1; 

/»  get  field  id  nuabers  and  place  in  bits  u-15  and 
20-31  respectively  */ 

if  (getfldid  ()  <-  0)  return;  /*  error  in  getting  field  */ 
•progaddr  =|  * (spare=b asaaddr  + fldisp  ♦ F1DI3) ; 
if  (getconna  (0)  <=  0)  return;  /*  bad  foraat*/ 
if  ( into  <=  2)  /*  r.o  aesory  reference-get  second  field*/ 

( 

if  (getfldid  ()  <=  3)  return; 

•(progaddr  ♦ 1)  *|  * (soara=baseadir  * ill iso  ♦ yiOIO) ; 

} 

else 
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( /*  LBS  or  STS  •/ 

*?2adptr*-*-  = progaddr  + 1; 

if  ( getnaoe (na se)  <=  0)  return; 

lihoffset  ()  ; 

} 

/*  chech  for  indirection  */ 

if  ( getcoaua  (1)  <=  0)  /»  no  indirection  */ 

( progaddr  =*■  2; 

return; 

} 

if  (*lineptr  !=  *1*) 

( error  (19) ; 

return; 

1 

/*  set  bit  one  to  indicate  indirection  */ 

* (prcgaddr)  =|  or.as[  1 ] ; 
progaddr  =♦  2; 

} 

/■* 


******  3SA  hCH_0?  ****** 

arguments:  row  nnaber  of  array  inst  where  instruction 

aneaonic  is  found 
returns:  none 

notes;  roatine  to  generate  code  for  branch  and  halt 

instructions 


bra  nch_op ( inun) 
inf  inun; 

{ iat  n; 

/*  assign  proper  bit  7alues  to  bits  1-3  »/ 

•progaddr  =|  050000; 

/»  set  bits  16-1?  */ 

•(prcgaddr  ♦ 1)  =|  reg codef inuu  ] ; 

if  (iron  >—  2a  s&  inuu  <~  25  ) /*  a branch  instruction*/ 
{ if  (spaces  ()  <=  0) 

{ error ( 13)  ; 

return  ; 

} 

'if  ( inun  ■==  25)  /*  conditional-get  con d-  code*/ 

[ if  ( (a=convert  ()  ) > 21  ||  n < 0 ) 

{ error  (29); 

retnrn; 

} 

« (prcgaddr  + 1)  =|  a; 
if  (getccnaa (0)  < 0)  return; 

} 

/•  get  label  «/ 

if  ( getnane  (naoe)  d*  0)  return; 
if  (la bo £f set ( ) < 0)  return; 
if  (getconna  (1)  ==  1) 

( /*shou Id  be  indirect  */ 

if  (*Lineptr  *=  * I* ) 

•(progaddr  * 1)  =|  C1CCC0; 
else  error (1?) ; 

) 

*p2adptr +*  = orogaddr; 

1 

/"  SIT  or  327  requires  no  farther  argunents  »/ 
progaddr  =*  2; 


} 

/* 
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*****  0TH23  “? 


arguments: 

retu  rns: 
rotes: 


rov  nunber  of  array  i-st  v here 
instruction  rneuoric  is  found 
none 

routine  to  generate  code  for  ~- 
instructions 


*/ 

otter, 
iat  i; 


cp  (inun) 
uo; 

i 

/*  assign  proper  bit  7alues  to  bits  1-3  */ 

♦progaddr  =1  060000; 

■ /*  set  bits  16-19  */ 

* (progaddr  ♦ 1)  =|  regcodef inun ] ; 

•(pcogaddr  * 1)  =|  5; 

svitch-(inaa) 

t 

case  (-3)  : 

if  { spacacheck  ()  > 0)  /*  prooer  fornat  »/ 

( 

switch  (*lineptr) 

( case('I’):  break; 

case  ('  O' ) : * (progaddr  *■  1}  =|  cnes[2]; 
break ; 

default:  error (20)  ; 

} 

linept  r+  + ; 

if (get coara (2)  <— 9)  return; 
if  (getaaae  (r>ane)  <=  C) 

{ 

error  (25) ; 
return; 

} ' 

if  ( (c=f  indr.aae  (r.aae) ) ’ = ’o') 

{ 

error  (26) ; 
return; 

} 

♦progaddr  = | * (spar a=baseaddr  + 10  ♦ 71010)  ; 

else  error  (20 ) ; 
break; 
case  (29)  : 

/*  3C!*3  */ 
break; 

} 

progaddr  = » 2; 
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1 


return  (- 1}  ; 

} 

return  (1) ; 

} 

/• 

******  Qyr  Qp  ******* 

arguaents:  rov  mistier  of  array  inst  where  the 

instruction  is  found 
returns:  none 

notes:  routine  to  gener  ate  code  for  one  o?er  and 

instructions 

*/ 

one_o?  (iaua) 
int  inun; 

l 

/*  assign  proper  bit  values  to  bits  1-3  */ 

•progaddr  =|  020000; 

/*  set  bits  1*3-13  */ 

•(progaddr  ♦ 15  = I regccde[ iaua ] ; 

/*  get  field  id  r.caher,  place  in  bits  a-*5  */ 
if  { spaces  ()  <=  0 ) 

{ error  (13)  ; 

return; 

) 

if  ( getfldid  (5  <=  0)  return;  /*  bad  field  aaue  */ 
•progaddr  = I • (s?are=b aseaddr  ♦ fidis?  + tLOlO  5; 
progaddr  = ♦ 2; 

} 

/* 

******  0HI"_C?  ****** 

arguaents'  row  r.uaber  of  array  iast  where  the- 

instructioa  is  feuad 
return: . none 

notes:  routiaes  to  generate  code  for  2-operaad 

instructions 

V 

unit_o?  (inur) 
int  ir.ua; 

{ 

int  i; 

/*  as*agn  bits  1-3  proper  value  */ 

•progaddr  =1  010000; 

/»  set  bits  16-15  »/ 

•(pregaddr  + 1)  =!  ragcade[ inua]; 

/*  get  unit  id  nuaber  place  in  bits  s-15  */ 
if  ( getaaae (naae)  <=  0) 

( error  (25); 

return; 

) 

if  { (c=findnaaa  (naae)  ) ==  0) 

[ error  (25) ; 

retura; 

} 

•progaddr  =>  * (stare  = baseaddr  ♦ 10  * FLO  ID)  ; 
if(  inua  <=  ni) 

( progaddr  =*  2; 

if  (c  1=  'a')  error  (25) ; 
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} 

else 

{ 


svicch  (icua) 

( 

case  (S2) : /-  Cl 3 */ 

if  { c 1=  * a’  ZZ  c ! = 'c') 
casa(a3):  /*  OiiJM?  */ 

settypeO  ; 
progaddr  = * 2; 
break; 

case  (JiU)  ; /«  sdOM?  */ 

•{progaddr  + 1)  =(  01; 
if  ( c !*  ’a*)  error{23); 
progaddr  = + 2; 
for(i=1;  i<=2;  !•*•♦) 


error  [26) ; 


C 


V 


} 


/*  get  seacr ? boucdaries 
if  (getccoaa  (0)  <=  0) 

{ error (2b) ; 

retarn; 

1 . 

else 

if  ((tea?  = = 0}  zz  * (liceptr-l)  !=  '0') 
{ error  (2b); 

retara; 

) 

■progaddr**  = tea?; 


break; 

case  aS;  /•  7CT~  */ 

» (pragaddr  * 1)  =|  OS; 
if  ( c !=  «7<)  error  (23)  ; 
progaddr  = + 2; 
break;, 

case  ag;  /*  SS3  */ 

case  a7:  /*  ?£?  •/ 

settype  ()  ; 
progaddr  = ♦ 2; 
break; 

} 


argaseats: 

returns: 

cotes: 


•/ 

settjoe  () 

{ 


*****  SZTTtPZ 
none* 
none 

called  by  unite,  uses  the  last  twelve 
bits  of  * (progaddr  + ’)  i.  e-  opr.d2, 
to  indicate  unit  type  (ala,  epo,  etc.) 


* j 


switch ( c ) 

( 

case  ' a • : 

•(progaddr  * 1) 
break; 
case  'c' ; 

•{progaddr  * 1) 
break; 
case  ' b ' : 

•(progaddr  ♦ 1) 
break; 
case  ' 7' : 

•(progaddr  ♦ i) 
break; 
case  ' p * : 


= ! 01; 
■I  02; 

= | 02; 

= 1 0C; 
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arguments: 

retards: 

notes; 


******  COD  ZISST  **'*” 

rov  nuaber  of  array  inst  vherfe  tie  last 

aceaonic  is  found 

none 

routine  to  generate  code  for  "regular" 
instructions  and  aseably  type  fault 
injection  instructions 


*/ 

codeir.st  (inua) 
int  inua; 

C 


if  (iaaa  >=  0 SC  iaun  <=  10) 
tvo_op  (inua)  ; 

else 

if  (inua  >=  11  SC  inua  <=  23) 
iao_op  (inaa)  ; 

else 

if  { inua  >=  2b  SC  inui  <=  28  ) 
branch_op  (iaaa) ; 

else 

if  (inua  >=  29  SC  inaa  <=  3b  ) 
one_op  (inaa)  ; 

else 

if  (inua  >=  35  CS  inua  <=  b7) 
onit_op (inua)  ; 

else 

other_o?  (inua)  ; 


******  G;riZ21IA«Z  »••*** 

arguaents;  none 

returns:  none 

notes;  reads  case  of  oeaory  vhere  prograa  will 

loaded  and  cuts  naae  in  ueaicad 


getaeana  ae  {) 

( 

int  letccunr;  /*  counts  letters  in  aeaload  to  insure 

there  are  <=  3 letters  */ 

jlptr  = aealcad; 

vhile(  (c  = getchar  ()  ) =='•); 

*ulptr*+  = c; 
letccant  = 1 ; 

vhile  ( (cr=getcbar  ()  ) ! = ’ ' CC  c !=  ' an ' 

CS  letcount**  <=  3) 

•uiptr**  = c; 

•nlptr  = * b-3  ' ; 


arguaents: 

returns: 

totes; 


>*•  *A20  3 


none 

decodes  instructions,  places  events  on 
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queue,  and  executes  events 


*/ 

aain3  () 

{ 


} 

/* 


tiae  (tvec|  ; 
srand  (t7ec[  1 ])  ; 
duaay  = 0; 
lz  = duaay; 
duaay  = 1; 
longcne  = duaay; 
clock  = 0; 
pa  = progbeg; 
printq  (head)  ; 
while  ( oa  < progend) 

{ 

xe7ents  (clock)  ; 

if  ( pa  >=  progend)  /*  check  if  halt  */  break; 
cinst  ()  ; 

if  •(  fanltbit  ==  0)  clock  =+  EcCCOSr  If  Z ; 

} 

fiatiae  = 3273C; 
fiatiae  =*  fiatire; 
aevents  (fiatiae)  ; 


FlOiDDS  **>**» 


argaaeats: 

returns: 

notes: 


V 


12  bit  operand  field 

returns  the  address  value  as  an  integer 
generates  unit  type,  unit  index,  and 
field  index  froa  the  12  bit  ooeraad 


fldaddr (cpnd) 
iat  oond; 

[ char  *addr; 

onittype  = opnd  >>  9; 
oartiadex  = opnd  >>  d E 037; 
fid  index  = opnd  S 017; 


switch  (uaittyae) 

l 

case  0: 

addr  = 
break; 

case  1; 

addr  = 
break; 

case  2: 


Ealu[ unit index]; 


Eaea[ unitiadex]; 


case  3 

case  a 

case  5 

] 

addr  *♦  1C  * f 
return  (addr)  ; 


addr  = Ecpu[ unitiadex] ; 
break; 

addr  = Sbus[ unitiadex]; 
break; 

addr  = S7sc(  unitiadex]; 
break; 

addr  = EperC unitiadex  ] ; 
break; 

dindex  • 2 * F12SZZZ; 


[ 

* 


« 

I 


1 


) 

/ 
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azguzents: 

returns: 

rotes: 


nose 

none 

parses  the  instruction  pointed  to  by  ?a 


»/ 

parsevcrd  (3 

^ faultbit  = «pa  >>  15  & 01; 

cptyce  = *pa  >>  12  5 07; 

cpndl  = *pa  6 01777; 
cpr.d2  = *(pa  * 1)  C 07  777; 

ODspec  = ■ (pa  ♦ 1)  >>  12  S 017; 

) 

/* 

»•»*»»  C22AT2SC0I  **»»* 
arguaer.ts:  none 

returns:  the  "created”  node  if  all  is  OS 

0 if  all  nodes  have  been  used 
notes:  finds  a node  net  in  use  and  returns 

pointer  to  it 

V 

struct  node  "createnode  () 


a 


l 

register  struct  node  "nptr; 
register  struct  node  »r ptr ; 


) 

/* 


nptr  = Cnodesr  0 ]; 

zptz  = 5nodes[  SUKJJCDZS  1: 

v»ile  (r.ptr  < zptr)  [ 

if  {nptr->used  — 0)  £ 

(nptr-Oused)  +♦; 
return  ( notr  ) ; 

1 

nptr++; 

printf  ("no  free  rodes--iscrease  iia.tifOOSS  in  define  filers"); 
exit  ()  ; 


argusents:  pointer  to  a structure 

returns:  none 

notes:  resets  the  used  field  of  gptr  to  zero 

to  indicate  node  is  availble  for  use 

•/ 

free ( qotr  ) struct  node  "qptr; 

l 

(cotr->used) — ; 

) 

/* 

»**»*  INSI3T  »*»»» 


argusents: 

returns: 

notes: 


•/ 


tvo  pointers  to  structures  (see  notes) 
cone 

insert  node  'nevnode'  before  'anode' , 
a node  on  the  event  queue 


insert  (grade,  newnede) 
struct  node  "cr.ode, 

•nevnade ; 
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33SET 


{ 


} 

/* 


struct  node  *te  = ptr'; 

teaptr  = qnode->pre7; 
nevncde->pre v = tesptr; 
cewncde->next  = qr.ode; 
qnode->prav  = nevacde; 
taaptr->next  = newp.ode  ; 


isirc  ***** 


acgusents 

returns: 

notes: 


V 


none 

none 

creates  the  initial  e7ent  queue  with  the 
header  node  and  end  node.  The  header  node 
tine  raise  is  -1,  its  pointer  prer  is  0. 
The  clock  raise  of  the  end  node  is  32,757, 
'ts  pointer  next  is  0; 


initq() 

{ struct  node  *nev; 

head  = Sheadnode; 
head->tine  = -1; 
nev  = createnode  ()  ; 
da  cay  = 32750; 
nev->tiae  = dunay; 
nev->tine  =*  daaay; 
head->nert  = new; 
nev->prer  = head; 
bead->prer  = 0; 
nev->next  = 0; 

p = head->next; 

) 


/* 


QISST  ***** 


•/ 

ginst  (] 

£ 


arguaents: 

returns; 

notes; 


none 

none 

queues  the  instruction  in  pointed 
to  by  global  address  pa 


parseword  ()  ; 

/*  place  instruction  on  erent  queue  */ 

switch  (ootyoe) 

( 

case  0;  /*  Two  operand  instruction*/ 

tvoq  (0)  ; 
pa  = ■*■  2; 
break; 

case  1;  /*  Onit  instruction  */ 

unitq  ()  ; 
pa  •=♦  2; 
brea  k; 

case  2:  /*  Single  operand  */ 

oneq  {)  ; 

Fa  =*  2; 
break; 

case  3:  /*  Taoediate  operand  */ 

iaaq  0 ; 
pa  =♦  2; 
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br  ea  k; 

case  U;  /*  Tvo  operand  indirect  */ 
tvog  (1]  ; 
pa  =+  2; 
br  ea  k; 

case  5:  /*  Branch  •/ 

branchq  ()  ; 
break; 

case  6: 

otherq  (}  ; 
pa  =♦  2; 
break; 

case  7;  /*  Special  fa  alt  injection  */ 

faultq  (j  ; 
pa  =+  2; 
break; 

} 


arguaents:  all  arguments  needed  to  insert  the  e7eat 

specified  on  the  event  casus 
returns:  none 

notes:  places  specified  event  on  event  queue 

according  to  clock tire.  For  equal  clock 
tines,  it  is  placed  at  the  botton  of  the  list 
for  events  of  that  clccktine 


qe vent  (ta, tp  ,sp,a1 ,a2, v ,ask, f r ) 
long 

ta;  /*  coaplete  tine  *_/ 

int 

tp,  /*  operation  type  */ 

sp;  /*  operation  specification  */ 

char 

•al,  /*  address  1 */ 

*a2;  /*  address  2 »/ 

int 

v;  /»  actual  decinal  7alue  */ 

long 

usk;  /*  aask  for  stuck-at  f.i.  instructions  •/ 
float 

fr;  /*frecuency  for  ra  ndo.n  stuck  at  f.i.  instructions  */ 

{ 

struct  ncde  “new; 
long  »l?tr1,  *lptr2; 
int  *iptr; 

nev  = creater.ode  (3  ; 
nev->tiae  = ta; 
nev->cype  = tp; 
nev->spec  = sp; 
new->add1  = al; 
hev->3dd2  = a2; 
nev->val  = 7; 
nev->nask  = ask; 
nev->freq  = fr; 
lptrl  = nev->add1; 
lp  tr  2 =•  ne  v->add2 : 

if (initialisation)  nev->tine  = fltine; 
iptr  = Caev->rask; 


p = head  ->  next; 

while  (?->tire  <=  nev->tine)  p = ?->next; 
insert  (?,new)  ; 
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} 

/* 


V 


***  » * ??.!>' TQ 


argunents: 
returns: 
no  te  s: 


pointer  to  node  on  queue 
none 

prints  the  values  of  the  event  queue 
starting  at  gptr 


printg  (gptr) 
struct  node  *cptr; 

{ struct  node  *qp; 

long  ‘lptrl,  »l?tr2; 
long  11,12; 
int  *i?tr; 


I 

/* 


gp  = gptr; 

priatf  ("  tine  type  opspec  addl  add2  value"); 
printf  ("  sash  fregcncn")  ; 

while  ( g?->  next  !=  tail) 

£ lptrl  = qp->add1; 

lptr2  = qo-Oadd2; 

11  = *lp  tr 1 ; 

12  = *lptr2; 
iptr  = Sqp-Jnasfc; 
printf  ("?s  S3d  S3d  ", 

lccv (gp->tine)  , qp->type,  qp->spec) ; 
printf  ("'s  ", locv (11)  ); 

printf  ("Is  T,5d  'So56o  ' frn" , locv  (12)  , cp->vai, 
iptrfo  3,  iptr[ 1 ] , q?->frec)  ; 
qp  = qp->next; 

} 


*****  jvog  ***** 


argurea  ts: 

returns: 

notes: 


*/ 


ind  = 1 =>  iadirect,  = 0 =>  direct 
none 

eaters  a two  operand  instruction  op.  the  event 
queue.  If  the  instruction  is  not  fault 
injection,  the  associated  operation  tine 
plus  decade  tine  is  added  to  the  clock  to  gat 
event  conpletion  tine.  For  fault  injection 
instructions,  coapletion  tine  is  the  aax 
of  the  last  calculated  coapletion  tines  of 
of  the  two  fields  used. 


tvoq  (ind) 
int  ind: 

£ lcng  *addr1,  *addr2; 
char  ‘cbasel , *chase2; 
int  itenp; 

/*  check  for  indirection  */ 
if  (ind  ~ 1) 

( cpndl  = "(spare  = opndl)  ; 

opnd 2 = »(soare  = opnd2)  ; 

} 

if  (cpspec  > - 9)  /*  neaary  refarence  instruction*/ 

l 

cbasel  = fldaddr  (ocr.dl)  ; 
addrl  3 ctasel  ♦ CICCF; 
chase2  = pcogheg  * apnd2; 
if  (f aulthit) 

{ 
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r 


3 

else 

( 


3 

return; 
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fiat ia  e = »addr * ; 

qe vent  (f  ir.  tine  ,oct  ype, opspec,  c base* , cbase2 , 0 , iz  , 0.  0;  ; 


f iatiae=aa.T  (fiat ine, clock)  ; 
if  (fin  tine  <=  deck  ♦ DZCCD  ITI«Z) 
fintiae  =*  DZCODZXZSS; 
fintiae  =♦  FSTC2TIFS;  . 

gevent  (fintiae,optype,opspec,cbase1,cbase2,0,lz,  0.  Q)  ; 


} 

chase*  = f ldaddr  (opnd 1 ) ; 
cbase2  = f ldaddr  (cpad2 ) ; 
addrl  = cbasel  + CLOCK ; 

addr2  = cbase2  * CLOCX ; 

fintiae  = sax  ( '•addr'!,  *addr2  ): 
if  {faaltbit) 

( 

•addrl  = fintiae; 

*addr2  = fintiae; 

qevent  (fintiae ,oatyue,oosaec,  cbasel,  ctase2 , 0 ,iz, 0. 0) ; 

} 

else 

{ 

fintiae  = aax(  fintiae, clock) ; 
if  { fintiae  <=  clock  ♦ DZC0CZTI2Z) 
fintiae  =+  DSCG9Z7X2Z; 
if  (opspec  <=  S)  addoptiae(); 
else 


[ 


} 


switch  (ocspec) 

( 

case  ? ; 

fintiae  =♦  SC7ETI 22; 
break; 

case  8 ; 

fintiae  =*  C7P.TXK2; 
break; 

3 


if  ( ! initialiration) 

( 

•addrl  = fintiae; 
•addr2  = fintiae; 

3 


qevent  (fintiae , optype,  opspec, cbasel , cbase2, C,lz, 0 .0)  ; 


•••••  csza 


arguaents:  none 

returns:  none 

notes;  eaters  a or.e  operand  ir.stru 

event  queue-  Xf  the  instru 
faalt  injection,  the  asscci 
tine  plus  decade  tiae  is  ad 
of  the  clock  value  5 the  la 
tine  to  get  event  caapletio 
injection  instructions,  coa 
the  Inst  calculated  tiae  of 


ctaca  on 
ction  is 


ated 
ded  t 
st  f i 


a l e 1 1 


oper 
o t h 
eld 
2 • ■* 
on  t : 

i 


tne 
not 
a t ion 
e na  z 
conoi 
or  fa 


iaua 
e tio: 
ult 


tie  field • 
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trvmcnc^ 


oneq  0 

{ long  *addr; 
int  *iptr; 
char  *ciasa; 

chase  = fldaddr  fonr.d  1)  ; 
addr  = chase  * CLOCK; 
if  (faulthit) 

{ 


} 

else 

{ 


} 

/* 


fintine  = *addr; 

qevent  (fiat ire  ,octype ,00 spec , chase , 0, 0 ,lz , 0.  0) 


f intine  = aax(  *addr , clock) ; 
if  { fintiae  <=  clock  ♦ DICODirr.'iZ) 


case  0 : 

case  1 : 

case  2: 

case  3: 

case  ft : 

case  5: 

J 


iff  ! initialisation)  *addr  = f intine ; 

qe rent (fin tine, c ptvpe,o? spec, chase, C,0  ,1c, O.C); 


*****  I22C  ***** 


fin tine 
ipspec) 

= 4- 

2JC0DS72K2; 

f iatiae 
break; 

= ♦ 

S2GTIS2; 

fiatiae 

break; 

= ♦ 

c?LS2  22 ; 

f intiae 
break; 

= ♦ 

SQ2?ia2; 

fiatiae 

break; 

ABST222; 

f intiae 
break; 

= + 

CL2TIH2; 

fintiae 

break; 

= ♦ 

PAHrihZ; 

argurnen ts: 

none 

returns: 

none 

notes: 

enters  an  iaaediate  operand  instruction 
on  the  event  queue.  If  the  instruction  is 
not  fault  injection,  the  associated  operation 
tiae  plus  decode  tine  is  added  to  the  clokck 
to  get  event  coapletior.  tiae.  Far  fault 
injection  instructions,  con  pie t ion  tine  is 
the  last  calculated  coaoletion  tine  of 
te  field 

*/ 

iaaq  () 

{ long  *addr ; 
chat  *cbase; 
iat  *iptt ; 

chase  = fldaddr  (cpnd ))  ; 
addr  = chase  ♦ CLCCS; 
if  (f  aulthit) 

{ 

fintiae  = *addr; 
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qevent  (f  intire  , optype,  opspec,  cbase,  0 , opr.d  2 ,lr , 0. 0)  ; 


fintiae  = :aj(  *addr,  clock): 
if  ( fintiae  <=  clock  + D-COD iriki) 
fintiae  =♦  DICODITIilI; 
if  ( cpspec  <=  6)  addoptiae  ()  ; 
else 

t 

switch  (oospec) 

£ 

case  7 : 

fintiae  = «■  LCADTIbS; 
break; 

case  8 : 

fintiae  =♦  C??7I.>1S; 
condtiae=  fintiae; 
break ; 

case  9 ; 
case  1 0: 

fintiae  =*•  L3HTTS2; 
break; 

case  11; 
ease  1 2: 

fintiae  =<•  ISSUfS; 
break; 

3 

} 

if  ( > initialisation)  *addr  = fintiae; 
addr  = cbase; 

qeTent (fin ti ne .optype, opspec, cbase , 0,  opnd2 , It, 0 . 0) ; 


*****  3?. AN  CHQ 


arguaents:  none 

returns:  none 

cotes:  enters  a branch  instruction  or.  the 

e7ent  queue 


branchc  () 
( 


if  { (opsoec  C 01)  — 1)  /*■  indirect  */ 

{ 

cpnd  1 = * (spare  = Copndl); 

} 

lcond  = cnr.d2; 
if  { faultbit) 

( cement (clock, o ptype , opsp ec , 1 ,0 , oped  * , lcond, 0 . 2)  ; 

return; 

J 

if  (cpsoec  ==  2 | | opspec  ==  3) 

£ 

clock=  condtiae; 
fiatiae=  condtiae; 

?a  =♦  2; 

} 

else 

fintiae=  clock*  C2CC32TISZ; 
qeyent  (fintiae,  opt  7pe,  cpspec , 0 ,0  ,opr.dl , lcond , 0.0); 


UNI  70 
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arguaents:  none 

returns:  socs 

notes:  enters  a unit  operand  instruction 

on  the  event  queue.  Zi  the 

instruction  is  not  fault- injecting, 
fiatiae=  nax  ( sat (all  unit  field 
tines)  , elec*)  ♦ asscc.  operation 
ti-e  (♦  C2C0DS  if  aec.).  ~or  . 
fault- in jection  instructions, 
couple tion  tiue  is  the  aax  of  the 
fiald  tiaes  of  the  unit  fields. 

Each  field  within  the  unit  is 
then  assigned  the  new  tine. 

*/ 

unite  () 

{ *int  i; 

long  *addr; 
char  *cfcase,  *ca; 

chase  = fldaddr  ( opnd i)  ; 

/»  exanine  cocpletion  tines  of  each  field  within 
the  unit  and  ccnpute  the  nax  •/ 
aaxtine(cbas=,  nuaf  ieldsf  ocad2  ])  ; 
fintine  = atine; 

if  (fauitbit) 

( 

gevent  (fintine , apt 7=e, op  spec, ebase, 9,  a pud 2,1 
for  (i=0;  i<=  (r.uuf  ieldsf  oor.d2  ] -1)  :!♦*) 

( 

addr  = ebase  ♦ CLOCK  * (i  * 7L03-7.31 
•addr  = fintine; 

J 


= ,3.0)  : 


fintiae  - =ax(fintiae,  clock); 
if  ( fintiae  <=  clock  + DfCOOI-XSI) 
fintine  =♦  D SCCO  2T  IS  2 ; 
if  (opspec  <=  £)  addootiae  ()  ; 
else 

{ switch  (onsaec) 

C 

case  7 ; 

fintiae  = ♦ CX3TXS2; 
break; 

case  S : 
case  9 : 

fintine  =->•  D0h?cr22; 
break; 

ease  10: 

fintiae  = + 7CC2TX2t; 
break; 

3 

3 

qerent(finti re ,opt7pe, o? spec, ebase, 0,opnd2,l 
for  (i=0;  i<=  (nuaf ieldsf opnd2  ] -1);i++) 
t 

addr  = chase  ♦ CICCX  * (i  * 710S272 
•addr  = fintine; 

} 


= ,o.O)  ; 


argunents: 


•*•*»*  adccsiiss  ***** 


B-117 


THIS  PAGE  IS  BEST 

FRuM  COPY  EUKNISii 


returns: 

notes: 


none 

adds  ccaaor.  operation  tines  to 


*/ 

addopt  ioe  () 

( 


switch 

(opspec) 

{ 

case 

0; 

case 

1: 

fintiae 
hr ea  k; 

ADDTI12; 

case 

n • 

fintiae 

break; 

.1017122: 

case 

3: 

fintiae 

break; 

= * 

D IV 7112 ; 

case 

»;  . 

fintiae 

break; 

AID 7X12 ; 

case 

5: 

fintiae 

break; 

= > 

0 ?.T  I.*l  E 7 

case 

6: 

fintiae 

break; 

X OR 7112 ; 

} 

/* 


*****  SAX  IIS"  **•*” 


arguaeats: 
:etuir.s: 
no  t p s : 

V 

nar tine (chase,  fnua) 
char  *cbase;  int  faun; 

{ register  iat  i; 

long  *addr; 
char  *ca; 


address  of  first  data  field  of  unit, 
nnaber  of  fieldjs  vithia  unit 
the  sasiaua  coaoleticn  tiae  of  all 
fields  in  the  unit 

finds  aaziaua  coapletioa  tine  of  all 
fields  ia  the  unit 


} 

/* 


ca  = chase  * C-CCX; 
stiae  = la; 

for  ( i*1;  i <=  faua;  i**) 

{ 

if  (atine  < » (a ddr  = ca)  ) atiae  = * (addr  = ca)  ; 
ca  =♦  *13S ZZS  * 2; 

} 

*****  072  ZRQ  ***** 


arguaents:  none 

returns:  cone 

notes:  enters  other  instructions  on  event  cueue 

as  scecified  by  instruction 
V 

others  () 

( 

long  *laddr; 
iat  i; 

char  “chase; 

chase  = fldaddr  (cpnd  1}  ; 
laddr  = chase  ♦ CLOCK; 


* 


QUALITY  PRACTICABLE 

iEC  TO  DDC 
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) 

/* 


svitch  (oos:ac) 

C 

case  0 : 
case  1 : 

If  (f aultbit) 

l 

fintiae  = *iaddr; 

qevent  (f  intiae,  oat  ;oa,  osssec,  cbase,  0 f o=nd2  ,lz , 0. 0)  ; 

} 

else 

{ 

f intiae=aa x C*laddr , clock); 
if  (fin  tine  <=  clock  + DECCD ZTlhl) 
fintiae  =•*■  DECOD  2?  2. “.2; 
fintiae  -*  IOTIH2; 

•laddr  = fintiae; 
for(  i= 1 ; i<=3;  i*+) 

{ 

laddr  = class  + CL  OCX  * (i  *>  :LZSZZZ  * 2} ; 
•laddr  = fintiae; 

} 

qevent  (fiat iae,o tty oe, ansae c, chase, 3,oond2,ln,3.C)  ; 

} 

break; 

case  2:  /*  HOKB  -/ 

prin  tf  ("cncns" cnPPOG?  A K HO.ta  ED ! ! ! = nrn")  ; 
prin tf ("clock  value  = *scn" , locv  (clock)  ); 
exit  0 ; 
break; 

} 


*****  ?X0LTQ  ***** 


arguaents: 

returns: 

notes: 


V 


none 

'none 

enters  a special  fault  injection 
instruction  on  the  event  queue, 
tine  is  the  last  calculated  tiae 
field 


Coa 

of 


fanltc  {) 

( long  *addr; 
int  *i?tr; 
char  *cbase, 
long  Ltenp; 
float  x,y, 

f r, 

•Ffr; 


«ca; 


/*  value  of  frequency  */ 
/*  pointer  to  fr  */ 


cbase  = fldaddr  (onr.d  1)  ; 

fintiae  = "(addr  = ca  = cbase  * C1CCX)  ; 
svitch  (oosoec) 
l 

case  0:  /*  ?.X?  */ 

qevent (fintiae  , op type,e? spec, cbase, 3, 3 ,In,3. 3)  ; 
break ; 

case  1:  /*  D2  */ 

pa  =♦  2; 

qevent  (fintiae  ,opty pe , op spec , cbase, 3 ,*pa ,ln , 3 . 3)  ; 
break; 

case  2: 
case  3: 

pa  *♦  2; 
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case  u 
case  5 


i?tr  = Slteap;  EROM  COi-Y  EURMSH&J 

»iptr*+  = *pa; 

*ipcr  = * (pa  * 1]  ; 

gevent (f intiae , opt 7 re , cpspec, chase, 0, 3 , Iteap, 0. 3) ; 
break; 
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case  5:  /*  3SM  •/ 

pfr  = Sfr; 
pa  =♦  2; 
x = * (pa  * 2}  ; 

7 3 *(?a  + 3)  ; 

•pfr  = x / 7; 
iptr  = Sltaap; 

»iptr»+  = «pa; 

•iptr  = • (pa  + 1)  ; 

gevent (f intiae ,optype,opspec, cbase , 0, 0 ,ltenp, *?i r) ; 

?a  =♦  2; 

break; 

case  6;  /*  ?70  •/ 

case  7;  /•  3?1  •/ 

pa  =♦  2; 
iptr  = Slteap; 

•iptr++  = *?a; 

• iptr  = ■ (pa  + 1)  ; 

qevent (f intiae  , opt 7 ce, cpspec, cbase , 0,0 ,1 ten?, 0.0)  ; 
break; 


*****  xz vrsTs 


arguaents:  none 

returns:  none 

cotes:  while  the  tiaes  of  the  events  in  the  event 

gueae  are  less  than  or  egual  to  the  clock, 
the  events  are  executed  as  specified 
V 

xevents (xtiae) 
long  xtiae; 

£ 

struct  ncde  *xptr;  /*  pointer  to  node  being  executed  •/ 
p - head->r.ext; 

while  ( D->tiae  <■=  xtiae  CO  p->cert  !=  SOLI) 

£ 

xptr  = p; 
p = p->next; 
reaove  (xptr)  ; 
switch  (xatr->t  jae) 

£ 

case  0: 

tuox  (xptr)  ; 
break; 

case  1 : 

unitx(xptr) ; 
break; 

case  2: 

or.er  (x  ptr)  ; 
break; 

case  3: 

i'aox  (xptr)  ; 
break; 

case  s ; 

tvox  (xptr)  ; 

. break; 

case  5: 

branch  1 (xp  tr)  ; 
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case  6 : 

case  7 : 

} 


otherx  (xptr)  ; 
break; 

faults  (xptr)  ; 
break; 


3 

/» 


V 

reaove 

struct 

{ s 


atgnnents: 

returns: 

notes: 


*****  » ~»  * * * 

pointer  to  node  on  the  cueue 
none 

reaoves  node  pointed  to  by  qptr  and 
places  it  on  the  array  nodes 


3 

S' 


(<??*=) 

cede  *qptr: 
tract  node 
•prevptr, 

■nextptr ; 

pre7ptr  = qptr->?rev; 
cestptr  = qptr->next; 
prevptr->next  = nextptr; 
cextptr->?ra7  = prevptr; 
qptr->pcev  = HULL; 
qptr->aexc  = SOLI; 
free  (qptr)  ; 


*****  »ax  *****  4 

argunents:  tvo  positive  integers 

returns:  the  larger  of  the  tvo  integers 

notes:  coapares  the  integers  and  returns  the 

larger  of  the  tvo 


aax  ( il,  i2) 
loeg  il,  i2; 

if  ( ii  <=  i2  ).  return(i2)  ; 
return  (i  1)  ; 

3 

S* 

*****  T^«X  ***** 


arguaents: 

returns: 

notes: 


•/ 

\ 


pointer  to  node  of  event  to  be  exerut 


none 

executes  e7ent 
instruction)  on 
xptr 


ianediate  .operand 
node  pointed  to  by 


iaax  (xptr) 
struct  node  *x?tr; 

C 

long  *laddr,  l7al; 
lot  *iotr; 


lad dr  = xotr->add1 
lval  = xptr->vil; 
switch (xptr- >srec) 
{ 


ed 


B-121 


case  Q : 


THIS  PAGE  IS  BEST  QUALITY  PRACTICABLE 

FROM  COrT  FURinloHED  Tl>  Dd C 


} 

/* 


•/ 


•laddr  =+  lval ; 
break; 

case  1 : 

•laddr  =-  Iv a 1 ; 
break; 

case  2 ; 

• laddr  = * lval; 
break; 

case  3; 

•laddr  =/  lval; 
break; 

case  a ; 

•laddr  =S  lval; 
break; 

case  5: 

•laddr  =|  lval; 
break; 

ca  s e 6 : 

iptr  = laddr; 

iptr[ 1 ] =-  xstc->val; 

break; 

case  7; 

•laddr  = xptr->val; 
break; 

case  3 : 

/*  compare  */ 
in  jfanit  (.xptr- >add  T)  ; 
if  (*laddr  ==  lval)  condcode=0l; 
else 

if  (*laddr  < lval)  ccndcode=02; 

else 

if  (“laddr  > l7al)  ccndcode=03; 
break;  * 

case  9: 

/*  logical  left  •/ 

•laddr'  =*  2 * xptr->val; 
break; 

case  10; 

/*  logical  right  */ 

•laddr  =/  (2  » xptr->v3l) ; 
break; 

case  11; 

/•acith  left  */ 

•laddr  =<<  xpt:->7al; 
brea  k ; 

case  12: 

/*  arith  right  •/ 

•laddr  =»  xptr->val; 
break; 

iajfault  (xptr->add  1)  ; 


arguaents:  pointer  to  node  of  event  to  be  executed 

returns:  none 

notes:  executes  event  |tvo  operand  instruction) 

ca  node  pointed  to  bv  xptr 


tvox  (xptr) 
struct  ncde  *xotr; 

l 

int  »iptrl,  «iptr2; 
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} 

/* 


long  'laddr1,  *laddr2; 


case  7; 

case  3 : 


iptri[ 1]  =-  i?tr2[ 1 ]; 
break; 

■laddrl  = *laddr2; 
break; 


iaddtl  = xptr->add 

1; 

mis? 

laddr2  = xptr->add 

-> . 

switch (xptr->snec) 

{ 

case  0; 

•laddrl  =* 
break; 

case  1 ; 

•la  ddr2 ; 

•laddrl  =- 
break; 

case  2: 

•laddr2; 

•laddrl  =• 

•la  ddr2; 

break; 

case  3 ; 

•laddrl  =/ 
break; 

case  a; 

*laddr2; 

•laddrl  =S 
break; 

case  5 : 

•laddr2; 

•laddrl  =! 
break; 

case  S ; 

•la  ddr2 ; 

iptr 1 = laddrl  ; 

iptr 2 = laddr2  ; 

•iptrl  =- 

•i?tr2; 

/*  coaparisoa  */ 
in jf atlt (xptr- >add 1)  ; 
intcaar  (*Iaddr  1 , *laddr2) ; 
break; 

case  9:  /*  LOS  ■/ 

•laddrl  = *laddr2; 
break; 

case  10;  /*  STS  */ 

in jf a alt  (apt r- >add  1) ; 
•laddr 2 = *laddr1; 
break; 

) 

in  jfault  (x?tr->add1)  ; 


*****  0S2I  ***** 

arguzer.ts:  pointer  to  node  of  event  to  be  executed 

returns;  ior.e 

notes:  executes,  event  (one  operand  instruction) 

ca  node  pointed  to  b v xotr 

*/ 

one;  (xptr) 
struct  node  *xptr; 

{ 

long  * la  dd r ; 

laddr  = xptr->addl; 
switch  (jpt:->si“c) 

{ 

case  0 ; 

•laddr  = - (*la  ddr)  ; 
break; 

case  1; 
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argunents:  an  integer 

returns:  1 if  integer  of  odd  parity 

0 if  integer  of  even  parity 

notes:  finds  and  returns  parity  of  tie  integer 

V 

etpar  (nunb) 
ong  nunb; 

C int  i,  par; 

for  (par  = 0,  i=i;  i;  i=i"2) 
if  ( iGnunb)  par++; 
return  ( par  S 01)  ; 

J 

/* 

*****  SHA’ICHX  ***** 

argunents:  painter  to  node  of  event  to  be  executed 

returns;  none 

notes:  executes  event  (branch  operand  instruction) 

on  node  pointed  to  by  xptr 

•/ 

fcranchx  (xptr) 

■struct  node  *x=tr; 

{ 

int  *iptr; 

switch (x?tr->spec) 

l 

case  0:  /*  branch  unconditionally  */ 
case  1:  /*  indirect  */ 

pa  = progbeg  * xptr->val; 
break: 

case  2:  /*  conditionally  */ 

case  3:  /*  indirect  */  

iptr  - Sxptr->:ask; 
if  (condcode  C iptrf  1 ]) 

pa  = progbeg  ♦ xptr->val; 

break; 

case  5;  /*  branch  and  save  */ 

case  5;  /*  indirect  */ 

pa  = progbeg  ♦ xptr->vni; 
regsavej rscoun t*+ ] = pa  ♦ 2; 
break; 

case  6;  /*  HI?  */ 

pa  = prcgend; 

pcintf (" rnrncn ? cog  ran  terainated  no rnallynnnn ; 
printf ("Clock  * "son" ,locv  (clock) ) ; 
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/* 


V 


break ; 

case  S:  /*  32?  */ 

pa  = regsa7°r --rscou.ut  ]; 
brea  Sc; 


**»»•  7VJL72  ”*”* 


arguaents: 

returns: 

notes: 


pointer  to  node  of  the  event  be 

executed 

none 

executes  event  (special  fault  injection 
instruction)  or.  node  pointed  to  by  xptr 


faultr  (xptr) 
struct  node  “xptr; 

t 

long  “Iaddr; 
int  “iaddr; 
char  “base; 
float  “flptr; 

. base  = xptr->add1; 
switch  (xotr-Osoec) 
t 

case  0:  /*  2.1?  »/ 

* (iaddr  = base  * 2 2?.?7?2)  = 0; 

/“  assign  zeroes  to  s-a-1  aask,  ones  to 
s-a-0  aask  */ 

•{Iaddr  = base  ♦ ?SA1)  a Iz; 

•(Iaddr  * base  + 3SA1)  = Iz; 

•(iaddr  = base  ♦ ?SA0)  = 0177777; 

•♦♦iaddr  = 0177777; 

* (iaddr  = base  ♦ 2SA0)  = 01777777; 

•♦♦iaddr  - 0177777; 

break; 

case  1:  /«  02  *■/ 

aovenode  (xptr)  ; 
break; 

case  2:  /“  S.\0  •/ 

•(iaddr  = base  * 27277P2)  =|  01; 
if  ( • (iaddr  = base  * ?SA0)  ==  0) 

“Iaddr  = - 1 ;/• initialize  ail  bits  to  1*/ 
•Iaddr  =5.  xptr-boask; 
break;  " _ 

case  3:  /*  SA1  */ 

•(iaddr  = base  ♦ 22277 ?2)  =(  02; 

•(Iaddr  = base  ♦ ?SA1)  = 1 xptr->oask; 
break; 

case  a;  /*  3SA0  •/ 

•(iaddr  = base  + 2???7?2)  =)  0t; 
if  ( « (Iaddr  = base  ♦ 2SA0)  ==  0) 

•iaddr  = - T;/“initiaIize  ail  bits  to  1*/ 
•iaddr  =5  xotr->uask; 
flptr  =*  base  + 2SA07222; 

•flptr  = zptt-Ofzec; 
br  ea  k ; 

case  5;  /*  73 A 1 •/ 

* (iaddr  = base  ♦ 2?777?2)  =1  08; 

•(Iaddr  = base  ♦ 7SA1)  =1  xptr->=ask; 
flptr  = base  * 2SM722Q; 

•flptr  = xptr->freq; 
break; 

case  6*.  /*  2?0  •/ 

if  ( (teap  = •(  iaddr  = base  + 22KT7P2)  C)  ) 
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} 

/* 


vwi  x . 


— w.onjc«Li  £]jQ 


return;  /*  no  faults  */ 
if  ( (tesp  S 0 12)  ==  0)  return;  /*  no  S?.0  */ 

* (iaddr  = base  + PSA3)  =|  xptr-Prask; 

if  ( * (ia  dd  r - base  + PS  A 0)  -=  012777  SS 

* (iaddr  = base  + PSAO  ♦ 1) ==01 77T77) /•no  core 
» (iadd  c = base  + =S  177775; 

* {Iaddr  = base  + PSAO)  =1  x?tr->oas:<; 

if  ( * (iadd;  = base  ♦ 3SA0)  -=  0177777  So 

•(iaddr  = base  ♦ PSAO  * 1)  ==  0177777) 

•(iaddr  = base  + iPH7T?~)  =S:*77772; 

break; 

case  7;  /»  3?1  »/ 

if  { (tea?  = *(iaadr  = base  + 2;27T?2))  ==  0) 
return.;  /•  no  faults  •/ 
iff  (teap  S 05)  =-  0)  return;  /*  no  3A1  "/ 
•{Iaddr  = base  + PSA1)  =5  r:tr->aasit; 
iff  * (Iaddr  = base  + PSA  1)  ==  0) 

* (iaddr  = base  * 2P3TJPE)  =5  0177775; 
iff  * (Iaddr  = base  + ?.SA  1)  =•=  0) 

•(iaddr  = base  ♦ 2PP7TP2)  =5  0177727; 

break; 


3 

in  jfault  (xptr->add1)  ; 


SA3«/ 


••*••  KCVSHOD2  »**” 


argu  sects: 

returns: 

notes: 


V 

novenode  (xptr) 
struct  node  *x?tr; 


old  event  tine,  data  address,  new  even 
none 

finds  event  on  event  queue  correspond!, 
old  event  tine  and  data  address,  and 
so res  if  to  new  event  tise  position  on 
queue 


l 


} 

r 


long  oet;  /"*  old  event  tine  */ 
long  “Iptr; 

struct  ncde  *pt,  *pt2,  *sa7eptr; 
pt  = head->next; 

oet  = •(  lptr  = x?tr->  add  1 + C10CX) ; 

•Iptr  =♦  xptr->vai; 

whilef  ct->tice  !=  oet  ))  pt->add1  !=  x?tr->add1) 

( 'if  (pt->next  ==  110LI) 

' * /•  field  is  not  on  event  queue  •/ 

return ; 

pt  = pt->next; 

} 

saveptr  = ?t; 
saveptr->tine  = •Iptr; 
pt  = saveptt->prav; 
pt2  = saveptr->r.ext; 
pt->nert  = pt2; 
pt2->prev  = pt; 

while(  pt-  >tine  <■=  sav  ep  tr->tine)  pt  = pt->r.sxt; 
insert(pt,  saveptr); 


li  if  * 

U *»«L 


zx  ••••• 


tine 
g to 


argu  cents: 

returns: 

cotes: 


•/ 


pointer  to  node  of  event  to  be  executed 
none 

executes  event  (unit  operand  instruction) 
on  node  pointed  to  by  xptr 
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unit  x (xp  tr) 
struct  node  *xptr: 

{ long  ol,  o2; 

long  *lptr1,  *l?tr2; 

long  *rptr;  /•  pointer  to  result  V 

int  »iptr1,  *iptr2; 

if  ( xptr->soec  <=  6)  /«  instruction  for  alu  only  »/ 

{ 

rptr  = xptr->add1  + 3 * FI3SI2I  *2; 

01  = ’*(  Iptrl  = xptr->add1  );  - 

02  = • ( lptr2  = xp  tr->add1  *•  TLDSX22  « 2 ); 
switch  ( xptr->soec) 

[ 


case 

0: 

•rptr  = 

o I ♦ 

°2; 

break; 

case 

1: 

•rptr  = 
break; 

ol  - 

o2; 

case 

2: 

•rl- 

break; 

ol  • 

o2; 

case 

3: 

•rptr  = 
break; 

ol  / 

o2; 

case 

«: 

•rptr  = 
break; ' 

ol  S 

02; 

case 

5: 

•rptr  = 
break; 

CV  J 

02; 

case 

6: 

iptrl  = 

lptr 

i; 

iptr2  = 

iptr 

2; 

•iptrl  =-.  *i?tr2 ; 
iptrlC  1]  — ipts2£  1 J; 
break ; 


switch (xutr->s  rec) 

( 

case  7 : 

break; 

case  S: 

break; 

case  9; 

du apse  a (xp tr->add1  ] ; 
break; 

case  10: 

rote  (x  ptr->add!)  ; 
break; 

case  11: 

*{x?tr->add1  ♦ STAT)  = 1; 
break; 

case  12: 

*(xptr->add1  * STAT}  = 0; 
break; 


*»**»  VOT2  ***■•» 

argaasnts:  address  of  first  register  of  tsd 

returns:  none 

notes:  coapares  the  fields  of  the  vsd  for  a 
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sajority  value  which  is  placed  at  the 
output,  if  no  na.iority  is  found, 
the  output  register  is  set  and 
the  data  output  register  sat  to  zeroes 

»/ 

vote  (chase) 

char  *cbase; 

I 

iat 

cuacoap,  /*  . nuoher  of  registers  t.o  compare  */* 

i.  1# 

alike,  /*  nuaber  of  other  values  the  field 
agrees  with  */ 

*i?tr; 

long 

•sptr,  /*  status  pointer  */ 

•lptrl,  »lptr2; 


iptr  = chase; 

cuacoap  = iptrftj; 

if  ( macs  a?  <=0  | | nuacoao  >=  9) 

( 

pri.ntf  ("inproper  node  sire,  instruction  skinnedm'’}  ; 
return; 

} 

s?tr  = chase  + (11  * PI3SIZI  * 2); 
for  (i= 1 ; i <=  {nuocoap/2  * 1)  ; i++) 

l 

lptrl  = chasa  ♦ TZ.3SZZZ  * 2 * i; 
printf ("data’1 d='s" , i, locv  (*lp  trl)  ) ; 
printf ("addr=Sd",lptr 1) ; 
alike  = 1; 

for  ( j*i+1;  j<=nuacoan;  j++) 

{ 

if(*lptr1  = * {lptr2-=cbase  ♦ 7LDSIZI  * 2 " j) ) 
alike-*-*'; 

} 

if (alike  > nuacoao/2)  /*  aa'oritv  */ 

r ‘ 

*sptr  = Iz; 

Iptr 2 = chase  + (10  * TZ.ZSZZZ  * 2) ; 

»lptr2  = “lptrl; 

printf  ("resul t = 3s",  leer  («lptr.2)  ) ; 

printf  ("adreso!t=5d"  , lptr2) ; 

printf  ("uaj  alike  vai=3snn"  ,lacv  (*lptr r)  ); 

return; 

] 

} 

Iptr 2 = chase  * (10  • ILDSIZt  * 2)  ; 

*lptr2  * lz; 

•sntr  - longone; 

} 

/*  *****  aoasasa  ***** 


argunents;  address  of  first  register  in  nenory 

returns:  none 

notes:  duaps  the  entire  nenory 

•/ 

cuncsen  (addr) 
iat  «addr; 

( 

duaav  = 0; 

] 

/* 

*****  0 TH  Z? X • 


B-123 


V 

ot 
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£ 


} 

/* 


V 

intcapr(x,  7) 
iat  x,y; 

{ condcode  = 01;  return;} 

) f condcode  = 02;  return;} 
( condcode  = 0“;  return;  } 


*****  i:)IT7.\0LT3  '***» 

none 

.aoae 

gets  nane  of  fault  initialisation  file, 
encodes  the  instruction  (as  in 
Phase- 12}  and  decodes  the  instruction 
and  places  on  event  queue 


printf  ("Expecting  fault  initialisation  filecn") ; 
openfile (C fa nltf ile)  ; 
initialisation  = 1; 
vhile(  getliney  ! = -i) 

{ 

for  (i=0; i<=5; i+*)  spacer i j = 0; 

proqaddr  = SspaceCO]; 

if  (s oacas  ()  <=  0) 

C 

error  ( 13}  ; 
con tin  ue; 

1 
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/* 


if 

if 

if 


(r  = 7) 
(*<73 
(*  > 7) 


argusent ; 

returns; 

notes: 


*/ 

initfaulss  () 

[ ir.t  n ; 

iat  s?ace[6 ]; 
int  i; 


argu  nents; 

returns: 

notes; 


herx  (xptr) 
ruct  node  *x?tr; 


pointer  to  node  of  event  to  he  executed 
acne 

executes  event  (two  operand  instruction) 
on  node  pointed  to  by  xptr 


char  "chase; 
long  *laddr; 

laddr  = xptr->add 1 + PLBSI2I  * 2 * 3; 
switch (xptr->spec) 

£ 

case  C: 
case  1 : 

dU3By=  1; 

*laddr  = dunav; 
break ; 

] 


i;it  cap? 


arguaents:  tvc  integers,  x and  y 

returns:  none 


notes: 


sets  condition  codes  connaring  x and  y, 
i.e,  x = y,  x > y,  x < y‘ 


M 


} 
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fltiae  = 1:; 

vhile(  »liseotr  >=  »0»  55  *lir.eptr  <=  '9", 

( ' 

fltine  = * 10; 

fltiae  =*  *line:;tr  + + - ’O'; 

1 

if  ( spaces  ()  <=  0) 

( error (18)  ; 

continue; 

} 

if  { getnane  (aa:«)  <=  0) 

( error  (22); 

continue; 

} 

if  { nane[ 0 ] !=  • * • ) 

{ error ( 23)  ; 

continue; 

) 

codefltin j 0 ; 
pa  ■=  5space[0]; 

gins t o ; 

) 

initialization  = 0; 


r 


*****  I’JJ?A017  ***** 


V 


t 


argu neats: 

returns: 

notes: 


address  of  data  field 
none 

checks  whether  a field  is  tc 
hate  any  persar.ent  faults  injected 
and  injects  the  specified  faults 


ault  (chase) 

*chase ; 

long  *as)t?tr,  *lptr; 
int  *iptr; 

float  randnan,  *fracptr; 


if  ( 

{ 


'{iotr  = chase  + !=  0)  /-any  faults  p: 

lotr  = chase; 

if  ( »iptr  5 0 1)  /*  SA0  -/ 

( askptr  = chase  * ?3A0; 

r *l?tr  =5  *=sk?tr; 

} 

if  { *iptr  5 02)  /*  SA1  */ 

{ asJcptr  = chase  + ?SA*; 

*lptr  =5  *ask=tr; 

} 

if  (*iptr  s 0t)  /*  3SA0  •/ 

{ 

randnu  a = rand  ft  ; 
raadrui  =/  227  67; 
freaptr  = chase  * ?SA0r?I2; 
if  (randans  <=  *fregptr)  /*  inject 
{ 

askptr  = chase  ♦ 23  AO; 

•lptr  =5  »=st?tr; 

J 


?S  A0  */ 


} 

if 

t 


(*iptr  5 03) 


rand nu  a = rand  ft  ; 
rand ru s =/  327 67  ; 


0 


